A Generic Framework for Heap and Value Analyses
This seminar explores a formal framework for the static analysis of object-oriented programming languages, specifically addressing the challenge of combining Heap Analysis (which tracks memory structure and pointers) with Value Analysis (which tracks primitive data like integers).
The core problem identified is the loss of precision that occurs when dealing with dynamic data structures, such as lists, where a single "summary node" abstracts multiple concrete nodes. When an analysis traverses such a structure, it must perform an operation called materialization, splitting the summary node into a concrete node (current element) and a remaining summary node. In standard approaches, this split causes the Value Analysis to lose the relationship between the fields of these nodes, leading to false alarms and over-approximation (e.g., assuming a value is "top" or unknown).
The presented solution is a generic framework that introduces Substitutions as a communication interface between the two domains. When the Heap Analysis alters the structure (e.g., during materialization), it generates a substitution message (e.g., mapping the old summary identifier to the new concrete ones). The Value Analysis consumes this message to automatically and precisely update its state, preserving invariants such as variable signs or ranges without manual intervention.
Theoretical Framework
The framework is built upon Abstract Interpretation, formally defining a Split Domain where the concrete state is divided into two disjoint components:
- Reference State (): Tracks topology and pointers.
- Value State (): Tracks primitive values.
To bridge these disjoint worlds, the framework utilizes Heap Identifiers
()—symbolic names exported by the Heap Domain representing abstract
memory regions. A preprocessing function, , is introduced to resolve
field accesses (like x.f) into these identifiers, allowing the Value Analysis
to operate on them transparently as if they were local variables. Soundness is
guaranteed through a rigorous definition of the Concretization function
() and the existence of a valid Galois Connection between the abstract
and concrete domains.
Instantiation and Results
The versatility of the framework was demonstrated by instantiating it with different abstract domains:
- Pointer Analysis (PA): A static, flow-sensitive approach based on Allocation Site Abstraction, where identifiers are fixed and substitutions are empty.
- TVAL+ (Shape Analysis): A dynamic approach based on 3-valued logic (TVLA), which actively uses substitutions to handle the merge and split of summary nodes.
For the value component, standard numerical domains (like Intervals or Octagons) were plugged in successfully. The results confirm that the framework can unify static pointer analyses and dynamic shape analyses within a single formal model, solving the information loss problem during materialization while maintaining mathematical soundness.