Specializing C and x86 Machine-Code Software with OS Assistance
Abstract
There is an intrinsic tension between the incentives for developers of commodity software and the desires of users of commodity software. Developers of commodity software must support a multitude of users and use cases, and must often develop and maintain large sets of optional features to address and anticipate diverse use cases. Individual organizations and users often have specific, well-defined use cases that depend only on a subset of a program’s features. From the perspective of the end user, unused features constitute bloat, which has ramifications in terms of program size, performance, and attack surface. Thus, for end-users, it is
desirable to have an automated means of producing programs specialized for their use case.
One means of performing program specialization, is via a partial evaluator. A partial evaluator takes as input a subset of a program’s input (referred to as static input), and identifies a portion of the program’s text that can be executed safely on the static input. The partial evaluator executes the “safe” portion of the program on the static input, performing an exploration of the partial program’s state-space. In the course of this state-space exploration, the partial evaluator simplifies the the program,
using information computed from the static input. Partial evaluation can remove unreachable code, and perform optimizations such as constant propagation, loop-unrolling and function inlining. For low-level languages such as C and machine code, performing partial evaluation on programs of non-trivial complexity requires solving problems related to the statespace exploration of partial programs, particularly (i) saving and restoring program states, and (ii) identifying previously visited states.
In this thesis, I describe GenXGen, a system for partially evaluating programs written in C and x86 machine code. With GenXGen, I improve on the state-of-the-art for problem (i) by creating an OS-assisted mechanism to save and restore states. By using additional information made available by the OS, I improve on previous techniques for problem (ii) by using incremental-hashing techniques to implement an O(1) technique for identifying previously visited states (with high probability). In addition, I improve on the scalability of existing tools by using program-slicing techniques to identify the “safely executable” portion of a program’s code.
These techniques allow GenXGen to produce specialized versions of realworld Linux programs.
Subject
BTA
Debloating
Generating Extensions
Machine Code
Partial Evaluation
Specialization
Permanent Link
http://digital.library.wisc.edu/1793/85340Type
Thesis

