[PPL-devel] integer versus rational solutions

Mon Jul 13 13:11:23 CEST 2009

On Mon, 2009-07-13 at 12:07 +0200, Albert Cohen wrote:
> Tobias Grosser wrote:
> > On Fri, 2009-07-10 at 23:57 +0200, Albert Cohen wrote:
> >> Hi Tobias.
> >>
> >> Tobias Grosser wrote:
> >>> Stop. Do I understand this correctly. You do not want to check
> >>> dependencies after strip mining? I am very concerned about that, as one
> >>> of the benefits of working on the polytop model I thought is, that we
> >>> can schedule transformations in an arbitrary order. So it is required to
> >>> to reanalyze strip mined loops. This is also necessary for the
> >>> dependency checking we use for autopar.
> >> As presented in Austin in November, it is much preferable to follow the 
> >> transformation composition approach of Girbal and Vasilache that does 
> >> not require intermediate steps of dependence analysis and legality checking.
> > 
> > You are referring to "Facilitating the search for compositions of
> > program transformations"?
> 
> Yes. The 2d+1 encoding and the additional invariants/rules allow to 
> compose and generalize classical transformations to a polyhedral 
> representation that may have diverged arbitrarily far from the original 
> syntax tree.

Sure. The same with the polyhedral representation in Graphite. It also
should allow us to compose and generalize arbitrary transformations on
the polyhedral data structures.

However I do not remember that we specified any 2d+1 encoding in our
data structures. So at the moment we are more general, and as you might
remember from early discussions it is very important for me to keep this
until we have a discussion where we agree to trade of generality for
performance.
Ideally we would have an example, a compile time problem and a
discussion where we agree that adding a assumption like 2d+1 will
improve algorithmic complexity and that this improvement is required in
real world code.
This information should than be part of graphite-poly.h.

> 
>  > (...)
> 
> >> Konrad is working on Step 2.1 (enhancing strip-mine to update the 
> >> dependence info). Then, if you agree, he will help restructure the API 
> >> such a that dependence computation and dependence checking are ALWAYS 
> >> called explicitely, rather than implicitely in the transformation 
> >> primitives.
> > 
> > Sure. I would like us to be even able to call them after any single
> > transformation.
> > 
> > A loop like would be nice for the optimizations.
> > 
> > while (select_and_apply_transform (scop)) {
> > 	check_dependencies();
> > }
> 
> Yes. Skipping some checks require stronger technology to guarantee that 
> transformation steps will be legal or will eventually be legal/checked, 
> or even corrected/completed via enabling transformations (see 
> Vasilache's PACT'07 paper).
> 
> As a concrete, more original example: together with Razya and Konrad, we 
> are thinking about enabling parloops and interchange with an automatic 
> privatization "corrective step". We would ignore memory-based 
> dependences when looking for interchange combinations that improve both 
> parloops and autovect (using an extended cost model), then check which 
> ones were eventually violated in the selected transformation sequence, 
> and eventually privatize only those scalars/arrays that need to be 
> privatized to effectively remove those memory-based dependences.

That sounds interesting. We also thought about memory optimizations, so
I would like the poly_drs to not be read only, but read/write. This
means that we can analyze the data accesses on the polyhedral model,
change them in the polyhedral data accesses and graphite will take care
of creating the required GIMPLE code.

This should allow a lot of nice optimizations, like reduce memory
footprint if only a part of an array is written to or allow more
parallelism by trading memory for less dependencies.

> > I know this will be expensive and we want to disable this in the
> > production compiler. But I want us to be able to use this loop for
> > debugging.
> 
> Sure.
> 
> > I totally agree that it is useful to avoid polyhedral computations, as
> > they tend to be expensive. However I still want to be able to do CHECK
> > and ANALYSIS at every single step, as it will show us assumptions that
> > we took to optimize compile time.
> > If we take any assumptions (that might be necessary), I would like them
> > to not be hidden by avoiding dependency ANALYSIS, but I want our
> > (nightly) test to fail.
>  >
> > This will allow a analysis/discussion of the assumptions we took. Maybe
> > we can avoid them, if not we know we have to document them. Every
> > assumption has to be written down at a well known place so that future
> > developers will not be surprised.
> 
> OK. But running the ANALYSIS itself after transformation makes little 
> sense to me, even in debugging mode. Can you explain why?

Sure. The idea I have of Graphite is that we have a set of
transformations that all analyze GPOLY and try to improve it.
However improve does not go only in one direction. For example there
might be passes trading dependencies for memory usage, that improve
memory usage, but add dependencies or remove dependencies and add memory
usage.

I would like to have a clever scheduler (or some database) deciding
which passes should be applied. E.g. first we trade some memory for less
dependencies, than we try to expose more parallelism, maybe do some
interchange to enable fusion and fission, and finally parallelize all
the stuff.

To allow this free ordering we have to ensure, that every pass is able
to analyze GPOLY to see if it can apply e.g. interchange. So e.g. after
strip mining it might still be useful to do some more transformations to
allow vectorization. If the vectorization pass can not analyze the GPOLY
after strip mining is applied this will not work.

So what I want is:

1. We have one representation GPOLY where we do our optimizations on
2. Every optimization pass works only on GPOLY
2.1 Every pass has to be able to read GPOLY and to generate GPOLY
2.2 Every pass has to decide only by looking and analyzing GPOLY if it
can be applied.
3. Optimizing a program is applying a lot of PASSES on GPOLY to generate
better scattering and/or better data accesses.
4. Graphite will read GPOLY and following the description in GPOLY
generate the GIMPLE code.

It might be possible that this will not work or is to slow. If there is
a limitation in GPOLY the solution I see is to extend GPOLY officially
(e.g. to handle arbitrary conditions, allow changing the data
accesses, ...).

> >>> Another thing I do not understand is that you are often referring to
> >>> "large constants". I thought complexity problems mainly appear because
> >>> of a high number of dimensions, but not because of high values in a
> >>> (in)equality. Can you explain this to me?
> >> As mentioned several times, big constants tend to break the "structure" 
> >> of ILP problems or other polyhedral operations, leading to combinatorial 
> >> explosions. This is well known even in the case of simple algorithms 
> >> like Fourier Motzkin. They also increase the risk of memory overflow.
> > 
> > OK. I definitely believe you, but as this seems to become one of the
> > important points for Graphite performance I would like to understand
> > this well. Maybe we can also write a Wiki page to explain which
> > constructs should be avoided. We already learned NNC_Polyhedron are bad,
> > but what else is dangerous.
> > 
> > Here I am still not getting the problem. I looked again into Fourier
> > Motzkin and on the first look I could not find anything in it's
> > complexity depending on the values of the dimensions of the
> > (in)equalities. Any pointer or example that might help me understanding
> > this problem would be really helpful.
> 
> FM blows away when it generates too many inequalities. To avoid this, 
> people generally favor eliminations that generate 0 or 1 inequality, first.

Sure. I understand that the number of dimensions and/or (in)equalities
is a limitation. What I do not understand is why the values in an
(in)equality are problematic. This is why 

1 > i might be easier to handle than 99999999 > i

Or did I get you wrong and you did not say anything about this?

> >> Eventually, the bigger the constant the higher the loss of information 
> >> when relaxing to rational polyhedra.
> > 
> > OK. However when we do not put the additional restrictions the
> > information would never have been available. Or are we loosing other
> > information because of putting additional restrictions?
> 
> Sorry, I missed that part. Can you re-explain?

I thought the mail I was referring to is that we put restrictions on
parameters in the context of CLooG. As we know that a parameter "k" is
of type unsigned integer we know it the constraint 0 <= k <= MAX_INT is
always true. By adding this constraint to the context we allow CLooG to
remove some conditions, as the generated code assume that k will be in
this limits.

If I got your comments right, you warned that adding large constants
like MAX_INT might be bad for compile time performance. That is
something I did not understand and as adding the constraint to the
context improves the code generated by CLooG, I would like to keep
adding these constraints as long as compile time allows.

> >> All of this is very informal/intuitive of course. There are some/few 
> >> theoretical results about this, but Roberto and other experts are better 
> >> placed than myself to answer it!
> > 
> > @Roberto:
> > Can you give us some hints, what is really expensive and what should we
> > avoid when constructing polyhedrons in the PPL?
> > 
> > Tobi
>