BinEq – A Benchmark of Compiled Java Programs to Assess Alternative Builds

BinEq – A Benchmark of Compiled Java Programs to Assess Alternative Builds

17 October 2024

Incidents like xz and SolarWinds have led to an increased focus on software supply chain security. A particular concern is the detection and prevention of compromised builds. A common approach is to independently re-build projects, and compare the results. This leads to the availability of different binaries built from the same sources, and raises the question of how to compare the respective binaries (to confirm the integrity of builds, to detect compromised builds, etc). It is however not clear how to do this: naive bitwise comparison is often too strict, and establishing the behavioural equivalence of two binaries is undecidable. A pragmatic step towards a solution is to provision a benchmark that can be used to test and train equivalence relations. We present such a benchmark for Java bytecode, consisting of 622,029 pairs of binaries (compiled Java classes) labelled as to whether these classes are equivalent or not. We refer to these pairs as equivalence and non-equivalence oracles, respectively. We derive equivalence oracles from building 56 projects and project versions using 32 dockerised build environments (with different compilers, compiler versions and configurations). Non-equivalence oracles are derived from three different sources: (1) proven breaking API changes, (2) semantic code changes synthesised by means of bytecode mutations, and (3) code changes extracted from vulnerability patches. To illustrate how to use the benchmark, we describe an experiment using two equivalence relations based on locality-sensitive hashing.


Venue : ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED '24)

File Name : scor030-dietrich.pdf