Detecting Python Malware in the Software Supply Chain with Program Analysis

Detecting Python Malware in the Software Supply Chain with Program Analysis

26 April 2025

The frequency of supply-chain attacks has reached unprecedented levels, amounting to a growing concern about the security of open-source software. Existing state-of-the-art techniques often generate a high number of false positives and false negatives. For an effective detection tool, it is crucial to strike a balance between these results. In this paper, we address the problem of software supply chain protection through program analysis. We present HERCULE, an inter-package analysis tool to detect malicious packages in the Python ecosystem. We enhance state-of-the-art approaches with the primary goal of reducing false positives. Key technical contributions include improving the accuracy of pattern-based malware detection and employing program dependency analysis to identify malicious packages in the development environment. Extensive evaluation against multiple benchmarks including Backstabber’s Knife Collection and MalOSS demonstrates that HERCULE outperforms existing state-of-the-art techniques with 0.949 f1-score. Additionally, HERCULE detected new malicious packages which the PyPI security team removed, showing its practical value.


Venue : The Software Engineering in Practice Track of IEEE/ACM International Conference on Software Engineering (ICSE-SEIP '25)

File Name : Supply_Chain_Detection___ICSE_SEIP-camera-ready.pdf