A Novel Methodology for Hunting Zero-Day Vulnerabilities in Software Supply Chains Using Fine-Tuned LLMs
In cyberspace, where applications dictate our every move, protecting them is not just a concern but it is a critical survival imperative. Between 2021 and 2023, the Software-as-a-Service (SaaS) market grew to $195 billion [1], making web applications a prime target for cybercriminals, including large organisations like Acer. The recent ransomware attack on Acer was caused by the exploitation of zero-day vulnerabilities in Microsoft’s Exchange servers, which are integral to web-based email systems and had only recently been patched. The REvil group used these flaws to breach Acer’s systems and demand a £50 million ransom [2]. Consequently, adopting a shift-left approach to security, which involves identifying vulnerabilities before code goes live on the Internet, is more crucial than ever, as outlined in the top ten OWASP Web Application Security Risks under A08:2021—Software and Data Integrity Failures [3].
Today's application security testing methods distinguish vulnerabilities throughout the Software Development Life Cycle (SDLC), yet each one has specific limitations. SoftwareComposition Analysis (SCA) offers a good detection rate and is vital for vulnerability detection through apps’ dependencies but has high false positives and limited use in large projects [4]. Static Application Security Testing (SAST) tools are good at identifying potential code issues but often flag many false positives and lack in-depth end-to-end testing [5]. Dynamic Application Security Testing (DAST) tools can also be useful but tend to have a low rate of true positives and limited scope [6]. Interactive Application Security Testing (IAST) tools combine aspects of both SAST and DAST, incorporating human interaction for vulnerability remediation, still, none of these methods fully address vulnerabilities that need a deeper understanding of context and reasoning, pointing out a key gap in our ability to achieve more nuanced and complete vulnerability detection and remediation. Recent research introduced KARTAL [7], a state-of-the-art method using a fine-tuned Large Language Model (LLM) to detect complex Broken Access Control vulnerabilities with 87.19% accuracy. This study highlighted the need for more specialised labelled datasets to reduce false positives but did not address software supply chain processes or generate the corpus needed for fuzzer integration.
The rapid invention of AI has opened new possibilities in security standards. As AI advances in usage and complexity, securing the system is becoming a critical industry norm. Large Language Models (LLMs) have become pervasive, it is driven by the rise of both commercial and open-source pre-trained models.

Fig. 1.0. The increasing number of AI model parameters [8]
Figure 1.0 illustrates the progression of AI models, which have seen a dramatic increase in the number of parameters, from 117 million in GPT-1 to 1.76 trillion in GPT-4. This expansion highlights their potential in areas like vulnerability detection, making it more affordable and faster. SCA and IAST are complementary: IAST identifies issues in real-time with actionable insights, while SCA focuses on hunting dependency discrepancies.Integrating these approaches helps fortify software throughout development and operation.Furthermore, LLMs enhance security analysis by understanding natural language, adapting to different programming languages, and detecting invasive vulnerabilities, thus zero-day hunting becomes more feasible, with fewer false positives and negatives.
The integration begins with the collection of large corpus data from both open-source SCA and IAST systems. This combined dataset, rich in contextual information, is first used for transfer learning to adapt the pre-trained model with a small few-shot prompt sampling test. It is then extended and fine-tuned with a larger, dedicated security dataset to ensure optimal performance in the specific security domain. Targeted vulnerabilities are labelled for identification: CWE-20 (Improper Input Validation) as “1”, CWE-59 (Improper Link Resolution Before File Access) as “2”, CWE-78 (OS Command Injection) as “3”, so on and so forth, and samples without vulnerabilities are labelled as “0”. To identify the optimal pre-trained model with the highest performance and lowest latency, we evaluate the topperformance decoder-only transformers available on the HuggingFace platform. By doing so, the LLM’s aptitude to recognise subtle and complex vulnerabilities is a key advantage of this approach. By uncovering hidden attack vectors and complex interactions within the data, the LLM enhances the detection patterns of both SCA and IAST tools. The results from this analysis refine the detection algorithms of these tools, creating a feedback loop that continuously improves their accuracy.
The "hunting method" uses threat intelligence and behavioural analysis with SCA, IAST, and LLMs to detect vulnerabilities. A proposed solution is provided in the diagram below.

Fig. 1.1. The proposed hunting methodology within Software-Development-Life-cycle
Figure 1.1 shows a security testing workflow within the SDLC, split into "SCA & IAST (SAST)" on the left and "IAST (DAST Interactive)" on the right. In "SCA & IAST (SAST)," SAST, SCA, and IAST are combined. An LLM uses text corpus from specific wordlists to automate vulnerability scanning. Insights from this are stored in a prompts database to refine LLM prompts for CVE and CWE classifications, integrating into the "Code Commit" and "Build" stages to detect vulnerabilities early. The "IAST (DAST Interactive)" section focuses on dynamic testing. Data from previous stages improves the model, and fuzzing tests in virtualised environments are analysed by an IAST agent to identify vulnerabilities. This data is then formatted into prompts for classification, supporting the "Deploy" stage to ensure security before deployment. The production stage will proceed only once the IAST accuracy threshold is met. This approach tackles static and dynamic vulnerabilities, minimises false positives, and reduces misclassifications. However, there are challenges to navigate, such as making sure the integration of LLMs with IAST does not impact system performance and striking the right balance between detection accuracy and overall efficiency. Additionally, managing data labelling work is essential but can be quite cumbersome and time-consuming.
In summary, combining advanced AI models with traditional security methods significantly advances web application security posture. As threats grow, integrating these approaches will be crucial for large organisations' software supply chains. This approach enhances security measures and sets a new standard for DevSecOps practitioners worldwide.
This scientific article is copyrighted: https://e-hakcipta.dgip.go.id/index.php/c?code=MjkwZDY1NmI0Y2NlNzFjN2U2ZTJiMzRlNjEyZDFmNjUK
References
[1] Gartner, “Gartner Forecasts Worldwide Public Cloud End-User Spend- ing to Reach Nearly $600 Billion in 2023 — gartner.com,” 31 October 2022. [Online]. Available: https://www.gartner.com/en/newsroom/press-releases/2022-10-31-gartner-forecasts- worldwide-public-cloud-end-user-spending-to-reach-nearly-600-billion-in-2023. [Accessed 11 August 2024].
[2] J. Yeung, “Acer has reportedly fallen victim to $50 million USD ransomware attack due to previous Microsoft server flaws, Hypebeast,” 2021. [Online]. Available: https://hypebeast.com/2021/3/acer-microsoft-exchange-revil-50-million-usd- ransomware-attack. [Accessed 14 August 2024].
[3] A. V. D. Stock, B. Glas, N. Smithline and T. Gigler, “OWASP Top 10:2021 — owasp.org,” 2021. [Online]. Available: https://owasp.org/Top10/. [Accessed 11 August 2024].
[4] J. D. Pereira, “Techniques and Tools for Advanced Software Vulnerability Detection,” pp. pp. 123-126, 2020.
[5] A. Nguyen-Duc, M. V. Do, Q. L. Hong, K. N. Khac and A. N. Quang, “On the adoption of static analysis for software security assessment–a case study of an open-source e- government project,” Computers & Security, vol. 111, pp. 102-470, December 2021.
[6] F. Ö. Sönmez and B. G. Kiliç, “Holistic web application security visualization for multi- project and multi-phase dynamic application security test results,” IEEE, 2021.
[7] S. Sakaoglu, “Kartal: Web application vulnerability hunting using large language models,” 2023.
[8] J. Howarth, “Number of Parameters in GPT-4 (Latest Data) - Exploding Topics,” 06 August 2024. [Online]. Available: https://explodingtopics.com/blog/gpt-parameters. [Accessed 11 August 2024].