CHINA Questions and Answers

[Page last updated on 2001/01/23 09:32:39.]

Questions and Answers on the Release of CHINA

Will CHINA be released? If so, when?

CHINA will be released when it is ready for release.

Some parts of it, such as the groundness, boundedness, freeness, linearity, and sharing domains are complete and pretty stable now. Experiments are still under way on the widening of the structural information component. Then we will add domains for a couple of low-level properties (so called uninitialized variables and recursively dereferenced terms. I plan to rewrite the code dealing with magic transformation, and to reimplement from scratch the code for the handling of built-ins. A bunch of annotators will be the last bit needed for release.

Under which license will CHINA be released?

CHINA will be free software released under the terms of the GNU General Public License.

Please note: I said ``free software'', not ``open source software''.

Questions and Answers on the Benchmark Suite

Why Don't You Make the Benchmark Suite Available?

The short answer is: lack of time. More precisely, making the suite available poses a few problems.

First of all, I have received several programs confidentially directly from the respective authors and I am not allowed to redistribute them in any way.

The other programs were all fetched from public places on the Web during the last 3 years or so. Some of these programs are no longer available, perhaps by accident, perhaps because the respective authors decided they should disappear from the public places: I don't know.

Some programs have been superseded by new programs from the same authors. Making the old (possibly buggy, possibly incomplete) programs available from a source not under their control may be subject to criticism. Author of package X could blame me because people might (e.g., through the use of search engines) be mislead into downloading the archaic version 0.0.1-only-a-fool-would-use-me instead of version 4.13-latest-and-greatest.

Of course, with plenty of time at my disposal I could sort out all the above problems (e.g., I could get in touch with each author and ask for permission to redistribute the programs). But this is not my case: collecting the benchmark suite was and is a significant effort on my part and I can't spend more on it now.

However, you can take advantage of my efforts. I provide a partial program list that will facilitate you immensely in the hunt for benchmarks. You will not have to proceed blindly as I did (dredging the newsgroups, querying several search engines with queries like +"Prolog program", +Prolog +"source code", and dozens of others I routinely use). With the name of the program, its description, and often the name of the authors you can either find the program yourself or get in touch with the authors very quickly.

Questions and Answers on the Experimental Results

Why are you listing linearity? Isn't linearity only useful during sharing analysis?

No, it isn't. Linearity information is crucial also to several program verification methods. Moreover, if you are doing separate analysis (e.g., by analyzing libraries once and for all or by analyzing different modules in isolation) you will want to record linearity information so as to improve subsequent analyses.

Why are you listing the results for both goal-independent and goal-dependent analysis? Isn't goal-dependent/goal-independent analysis The Right Thing ®?

No. Neither goal-dependent analysis nor goal-independent analysis is The Right Thing To Do.

Goal-independent analyses are unavoidable: if the analysis encounters a goal like call(G) such that the principal functor and/or the arity of G are unknown, a goal-dependent analysis is either unsound or it virtually becomes goal-independent, since it must assume a ``don't know'' call pattern for any procedure. This is one of the reasons why focusing only on goal-dependent analyses is, in our opinion, a mistake. The other reason being that the ability of analyzing libraries once and for all is desirable and, more generally, so is the separate analysis of different program modules, especially in very large projects.

On the other hand, focusing only on goal-independent analyses is the opposite mistake: goal-dependent analyses, when possible, are usually more precise than goal-independent ones. For these reasons, we insist in presenting experimental results for both.

Why you have less benchmarks listed in the tables about goal-dependent analyses?

For several programs in the benchmark suite, goal-dependent analysis is pointless (see the answer to the previous question to see why it is so). This usually happens because the program contains a procedure call to an unknown procedure (e.g., by means of call/1). The CHINA analyzer promptly recognizes these cases and reverts to a goal-independent analysis.

Why you give analysis results also for programs where CHINA signals a possible incorrectness?

When CHINA detects the possibility of delivering unsound results it can be instructed to do one of two things:

emit a ``dont' know'' description for the entire program, thus avoiding the incorrectness;
issue a warning and continue as if nothing happened.

This situation may arise because the program is self-modifying in an impredictable way (as far as the analyzer is concerned). This happens, for example, when the program calls goals like assert(C) where the principal functor of C is unknown. This fact could be mitigated when we can assume the program was written for a system trapping the assertion of static predicates (i.e., those not explicity declared as dynamic). However, this is often unclear and, moreover, several Prolog systems have changed their behavior in this respect when passing from one release to another: which release were the authors referring to when they wrote the program we use as a benchmark?

The other reason for possible incorrectness arises in programs using setarg/3, which is currently not supported by CHINA. We will certainly implement support for setarg/3, but this work has very low priority in our task list, especially because only a couple benchmarks are using it.

For the experimental part of our work we instruct CHINA to ignore these causes of potential incorrectness. We do this because our supply of benchmark programs is scarce, so we are not in a position to throw away any. Even if the final analysis results may be incorrect with respect to the intended concrete semantics (something that, as explained above, we often ignore), these programs allow us to exercise the domains and analysis techniques anyway.

The analysis results are always an implication like

if predicate p/n succeeds, then [...]

You may think of the analysis results of ``nasty'' programs as implications where further conditions have been added in the antecedent. Thinks like `if the system traps any attempt to assert a non-dynamic predicate' or `if setarg/3 is only used to replace ground terms with ground terms' and so forth. Several programs could be also modified so as to avoid the problem, e.g., by using several procedure like

my_assert_p(X) :-
  [...]
  assert(p(X)),
  [...]

instead of a single procedure

my_assert(C) :-
  [...]
  assert(C),
  [...]

Another possibility, which is the one we are pursuing, is to increase the ability of CHINA to discover, say, which procedures may be dynamically modified by assert/1 and relatives. The bottom line is: don't worry. We are not shipping a production tool yet. When we will do it we will make sure CHINA issues a plain ``don't know'' for those cases it cannot handle correctly.

Why the numbers differ from the ones I have seen on your paper XYZ?

There are several reasons:

CHINA is continuously updated: domains are made smarter and/or combined in more effective ways, the handling of builtins is improved, widenings are made more selective.
CHINA employs several widening operators (devices that are used to throttle the complexity of the analysis). These widenings are controlled by several parameters. Even though fine tuning of these parameters is not our main worry at the moment, we keep experimenting with them as our experience and benchmark suite grow. Changing these parameters may have subtle effects: widening may not occur where it previously did and occur instead at another analysis point. Sometimes this is a win, sometimes not. We do not care about the effect on individual programs, but, rather, on the overall effect on the entire testsuite. Moreover, the widenings may (and often do) interact with eachother, and this complicates the situation even further: a widening in the groundness component may cause a widening on the sharing component.
CHINA, as any piece of software not made by the gods themselves, has bugs. Some are fixed, some new ones are introduced.
Citing from the README of a mathematical subroutine package by R. Freund:

For all intent and purpose, any description of what the codes are doing should be construed as being a note of what we thought the codes did on our machine on a particular Tuesday of last year. If you're really lucky, they might do the same for you someday. Then again, do you really feel *that* lucky?

The analysis times we report in order to assess the (relative) efficiency of analysis methods are subject to even more factors, in addition to the above mentioned ones:

The machines, operating systems, and compilers we use for the development and gathering of experimental data change. We usually indicate the CPU employed along with the frequency with which it is clocked, the amount of physical memory installed and the version of the operating system. A handful of other important parameters are instead omitted: memory speed, amount of cache installed, bus frequency, BIOS and OS parameters, compiler version, optimization options used in the compilation, libraries used and so forth.
The compiler has bugs. Sometimes we have to change the usual compilation switches in order to get around a particular bug in the optimizer. That's life.

Why the numbers you give for the XYZ analysis of `aprogram` differ from the ones given by A. N. Author in ``His Paper''?

First of all, see if the papers in question report on the methods used to obtain the numbers. A discrepancy in this respect is certainly one possible explanation. We may then use different analysis techniques, different widenings and/or widening strategies. Staten that, bugs may and do bite us as well as A. N. Author.

If the differences in the numbers dealing with the precision of the analysis are substantial, then I am mostly interested to know. Should you spot one of these cases, please let me know, especially if our numbers are worse ;)