Explicit Structural Information with the Pattern(·) Construction

[Page last updated on 2001/02/19 10:29:56.]

Introduction

Upgrading a domain with explicit structural information is an interesting technique for improving the precision of the analysis. Moreover, structural information is very important in itself for the purposes of both optimized compilation and program verification. If you know everything on structural information analysis you may want to skip the rest of this page and look at the results of the experimentation. Keep on reading otherwise.

I will try to introduce the basic ideas behind structural information analysis by means of a motivating example. Let us consider a simple but not trivial Prolog program: mastermind.pl.

Now consider the only direct query for which it has been written, `?- play.', and focus the attention on the procedure extend_code/1. A standard mode analysis of the program will tell you something like

extend_code(A) :-
  nonvar(A).

This means: ``during any execution of the program, whenever extend_code/1 succeeds it will have its argument bound to a term that is not a variable''. Not much indeed. Especially because this can be established in a second by visual inspection: extend_code/1 is always called with a non-variable argument and this completes the proof.

If we use an analysis domain keeping track of structural information the situation changes radically. Here is what such a domain allows CHINA to derive:

extend_code([([A|B],C,D)|E]) :- 
  list(B),
  (functor(C,_,1);integer(C)),
  (functor(D,_,1);integer(D)),
  list(E), 
  ground([C,D]),
  may_share([[A,B,E]]).

This means: ``under the circumstances mentioned above, the argument extend_code/1 will be bound to a term of the form [([A|B],C,D)|E], where B is bound to a cons (i.e., a term whose principal functor is '.'/2) or to the nil constant (i.e., []/0), and likewise for E; C is either bound to a functor of arity 1 or to an integer, and likewise for D; C and D are ground, and, consequently, pair-sharing may only occur between A, B, and E''. Now, that is something more interesting!

Note 1: The above syntax has been chosen for presentation purposes only. CHINA will adhere to the assertion language developed at the Computational Logic, Implementation, and Parallelism Lab in Madrid, unless something better arises.

Note 2: Some extra groundness information that is derived by CHINA has been omitted for the sake of simplicity: this says that, if A and B turn out to be ground, then E will also be ground.

Note 3: Although the use of assert/1 and retract/1 in mastermind.pl can be tracked with good precision by a smart analyzer, we are assuming an analysis assigning the ``don't know'' description to each dynamic predicate.

[Paragraph mentioning the use of terms as records in logic programming and explaining its relevance for the topic at hand.]

What makes the difference is the ability of the domain to keep track of some partial information about the actual structure of the terms that are built and manipulated during the computation. Note that this does not define precisely what we mean by structural information as several analysis domains say something about ``the actual structure'' of the terms. The usual domain for freeness, for instance, has this property: the assertion ``X is free'' says a lot about the actual structure of X. Unfortunately, if X is not free, then the freeness domain says nothing. That is why we try to be less ambigous by sticking an `explicit' in front of `structural information'. Thus, when we talk about a domain recording structural information explicitely we mean that a binding like X = f(A, B) is abstracted by itself, even though this explicit structural information ((X is bound to a term whose principal functor is f/2 and so forth) can change later. The actual structure of the term bound to X can be unveiled further (because the knowledge of the structure of A and/or B increases) or it can be lost completely (because of joins and/or widenings).

The Pattern(·) Construction

Recording and propagating explicit structural information is precisely the purpose of the Pattern(·) construction employed in the CHINA analyzer. If you wish to know more on the Pattern(·) construction, you now have enough intuition to face the reading of the following paper, appeared in the informal proceedings of the APPIA-GULP-PRODE'97 conference (you may want to stay tuned, as a better and more up to date paper on the subject is in preparation):

Structural information analysis for CLP languages. In Proceedings of the "1997 Joint Conference on Declarative Programming (APPIA-GULP-PRODE'97)" (Grado, Italy, June 1997), M. Falaschi and M. Navarro and A. Policriti, Eds., pp. 81-92.

Done? Read everything? That was quick! ;-)

We have conducted extensive experimentation on the analysis using the Pattern(·) construction: this allowed to tune the implementation and to gain insight on the implications of keeping track of explicit structural information. We are going to present a comparison between a rather complex domain for mode analysis, let us call it Modes, and the domain Pattern(Modes). The Modes domain represents and propagates information on simple types, groundness, boundedness, pair-sharing, freeness, and linearity. The comparison will involve the two usual things: precision and efficiency.

Assessing the Impact of Structural Information

How are we going to compare the precision of the two domains, one without explicit structural information and the other with it? That is something that must be established in advance.

Let us consider the mastermind.pl example again. Here is the situation:

using Modes	using Pattern(Modes)
extend_code(A) :- nonvar(A).	extend_code([([A\|B],C,D)\|E]) :- list(B), (functor(C,_,1);integer(C)), (functor(D,_,1);integer(D)), list(E), ground([C,D]), may_share([[A,B,E]]).

It is clear that the analysis with Pattern(Modes) yields much more information. However, it is not clear at all how this can be measured precisely. A possibility is to measure the information by the point of view of the application. For instance, if the application is optimized compilation, speedup and/or reduction of code size are two possible metrics (which would also be heavily influenced by the ability of the compiler to actually exploit the extra information). For program verification things would probably be more complicated.

The approach we have chosen to compare the effect of adding explicit structural information to the Modes domain is simple though unsatisfactory: throw away all the structural information at the end of the analysis and compare the usual numbers (i.e., number of ground variables, number of free variables and so on). With reference to the table above, this metric pretends that explicit structural information gives no precision improvements on the analysis of extend_code/1 in mastermind.pl. In fact, once all the structural information has been discarded, both the analysis with Modes and the analysis with Pattern(Modes) give the same piece of information: upon success, the argument of extend_code/1 will not be a variable. Very unfair to Pattern(Modes): if you know better ways to make a comparison, please, let me know.

[Introduce the results.] [Must explain everything.]

Here are:

the precision results for goal-independent analysis;
the precision results for goal-dependent analysis; and
the efficiency results.