Proto-libErator: Consumer-Code-Independent Library Fuzzing via Protobuf API Interposition
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis investigates a focused research question in consumer-code-independent library fuzzing: can static, type-aware API modeling provide a measurable early-coverage advantage over fixed-sequence and runtime-learned baselines, especially on stateful libraries? The central claim is twofold. First, static constraint-guided initialization gives a practical head start: Proto-libErator reaches meaningful code regions early on libraries whose APIs are stateful and whose call surface can be represented effectively through typed constraints. Second, a randomized super-harness dispatch over the full API surface explores API-sequence space more exhaustively than fixed-sequence harnessing, reducing sequence-lock bias and improving early campaign utility. We evaluate on 11 diverse C libraries (1,348 APIs; 17,961 inter-API edges). The evidence supports the claim on the intended library classes: on cJSON.c, Proto-libErator reaches 85.44% line and 100% function coverage in a 24-hour campaign; on pthreadpool it reaches 90.00%, substantially above all reported automated baselines (libErator 50.44%, Hopper 37.51%). Trajectory analysis shows consistent early-ramp behavior on most evaluated libraries, while taxonomy-based interpretation explains where gains are strongest (stateful, constraint-dense targets) and where they are structurally limited (environment-dependent or format-heavy targets). Beyond coverage, the approach yields actionable bug signal. Campaigns produced 36 ASAN- reproducing memory-safety crashes across 8 libraries, of which 29 have no matching public CVE. The strongest single-library result is 13 novel findings across 7 libdwarf API entry points, none appearing in the official libdwarf vulnerability database or NVD. Representative findings include c-ares (ares_create_query_int), cjson (cJSON_ReplaceItemViaPointer), libplist (node_attach), and libtiff (LogLuv24toXYZ). The overall conclusion is not that one technique dominates all targets, but that protobuf interposition plus static sequence constraints is a viable and scalable way to improve early accessibility in automated library fuzzing.