Geometric Abstractions for Scalable Program Analyzers
was a recent 1 year project (2008/9, EP/G025177/1) funded by EPSRC (UK).
Summary
It is widely acknowledged that relatively small defects in software
can have a substantial cost both for producers and consumers. For
example, system vulnerabilities are frequently introduced by
programming mistakes such as allowing out of bounds accesses to
buffers, overflows in operations on native integers and other errors
related to memory management. Of course, there can be other causes,
such as system design flaws, but finding and certifying the absence of
the low-level bugs is an important prerequisite to building secure and
reliable software. The approach we use to detect and locate
programming errors or certify the absence of such bugs is that of
static analysis; that is, the determination of correct though
approximate information about the program's values at each program
step. Static analysis has its roots in compiler optimization where
the analysis time has to be kept very low while the properties of
interest are fixed with respect to the compiler. More recently program
analyzers have been developed for program verification; however these
also consider a fixed set of possible run-time errors and aim for a
scalability and performance that enables them to tackle very large
programs.
Static analysis uses abstract domains for representing information
that needs to be collected. Thus these domains have to provide a
convenient but approximate representation of the accumulated
information during the abstract evaluation of a program. Observe that
the abstract domain component of a static analyzer has to include, not
only a computer representation of the logical properties of interest,
but also the operations needed to extract this information from the
program's components, primitives for propagating this information
forward and/or backward within the program, and operators for
accelerating the analysis process and ensuring loop iterations
actually terminate.
Since, many program properties of interest are intrinsically numeric,
there has been a considerable amount of research on how this kind of
information can be represented efficiently and precisely by means of
geometric domains. The problem being to get the "right"
efficiency/precision trade-off, which is difficult since this is
clearly dependent on the application. Thus many geometric domains have
been proposed and researched, the majority being defined by linear
(i.e., planar) bounds such as polyhedra; octagons; boxes, also known
as intervals; and grids, simple forms of which are also called
lattices. Such a range is needed since domains such as polyhedra,
although very precise, have high complexity and exponential space
requirements (relative to the number of dimensions) while simpler
domains such as octagons and grids are polynomial and the
non-relational domain of boxes has linear complexity.
Solving this scalability problem is the main motivation for this
project; here we will research new techniques for building compound
geometric domains that can be constructed from several atomic ones
such as those discussed above. In order to allow for varying the
efficiency/precision trade-off, not only will it be parametrized on
the component domains but it will also have a highly adjustable
strategy for varying the kind and amount of communication between
them. Thus a successful project will provide bespoke domains that are
tailored for the application, allowing for both the type of property
being verified and the size and complexity of the software being
analyzed.
|