[PPL-devel] Thread Safety (continued)

Wed Oct 26 01:30:34 CEST 2016

Hi Enea,

I have installed the developmental version of ppl and configured it with
thread-safety on. It seems to work just as you say it will, but I am having
issues getting the expected speedups. To demonstrate the speedup issue, I
have included a sample program below. This program creates a user inputted
number of threads, and in each thread it intersects two NNC_Polyhedron a
user inputted number of times. For timing comparisons, I also made a code
path in the test program that does not call PPL but rather computes
logarithms.

#include <ppl.hh>
#include "Thread_Pool_defs.hh"

using namespace Parma_Polyhedra_Library;
namespace Parma_Polyhedra_Library {using IO_Operators::operator<<;}
using namespace std;

void TestIntersections(int RepCount, bool TestPPL) {
double x = 10.0;
double b = 0.0;
for (size_t i = 0; i != j; k++) {
if (TestPPL == false) {
x += i;
b += log(x);
} else {
Variable x0(0);Variable x1(1);Variable x2(2);Variable x3(3);Variable x4(4);
Variable x5(5);Variable x6(6);Variable x7(7);Variable x8(8);Variable x9(9);
Constraint_System cs1;
cs1.insert(x8-x9==0);cs1.insert(x2-x9>=0);cs1.insert(x3-x9>=0);
cs1.insert(x4-x9>=0);cs1.insert(x5-x9>=0);cs1.insert(x1-x9>=0);
cs1.insert(x6-x9>=0);cs1.insert(x7-x9>=0);cs1.insert(x0-x9>=0);
NNC_Polyhedron ph1(cs1);
Constraint_System cs2;
cs2.insert(x7-x9==0);cs2.insert(x2+x3-x8-x9>=0);cs2.insert(x1+x2-x8-x9>=0);
cs2.insert(x3+x4-x8-x9>=0);cs2.insert(x0-x8>=0);cs2.insert(x5+x6-x8-x9>=0);
cs2.insert(x6-x8>=0);cs2.insert(x0+x1-x8-x9>=0);cs2.insert(x4+x5-x8-x9>=0);
NNC_Polyhedron ph2(cs2);
NNC_Polyhedron ph3(cs1);
ph2.add_constraints(ph2.minimized_constraints());
ph2.minimized_constraints();
ph2.affine_dimension();
};
};
}

int main(int argc, char* argv[]) {
int TotalProcessCount = atoi(argv[1]);
int RepCount = atoi(argv[2]);
bool TestPPL = atoi(argv[3]);
typedef std::function<void()> work_type;
Thread_Pool<work_type> thread_pool(TotalProcessCount);
for (size_t i = 0; i != TotalProcessCount; i++) {
work_type work = std::bind(TestIntersections, RepCount, TestPPL);
thread_pool.submit(make_threadable(work));
};
thread_pool.finalize();
return 0;
}

This is how I compiled:
g++ -std=c++11 -pthread file_name.cpp -l:libtcmalloc_minimal.so.4.2.6 -lppl
-lgmpxx -lgmp

I tested this on a new machine with 44 cores and hyperthreading
(thread::hardware_concurrency() = 88), run with RepCount = 10,000 and
TestPPL = true. Here are the timings:
#thread,real time (from time)
1,0m0.925s
5,0m1.820s
10,0m3.041s
20,0m3.758s
40,0m6.775s

By way of comparison, here are the timings for RepCount = 50,000,000 and
TestPPL = false:
#thread,real time (from time)
1,0m1.767s
5,0m1.854s
10,0m2.012s
20,0m2.139s
40,0m2.206s

Assuming sufficient hardware, I would expect it to take the same amount of
time for 1 thread as 40 threads, though I know that that is not quite
realistic. Am I doing something incorrectly in the PPL code branch that is
causing it to slow down so much as the number of threads increases? I am
not very experienced with parallel C++ programming, so please forgive me if
I am doing something foolish. Thanks so much for all of the help.

Best,
Jeff

On Sat, Oct 8, 2016 at 4:15 AM, Enea Zaffanella <zaffanella at cs.unipr.it>
wrote:

> Hello John.
>
> On 10/07/2016 06:54 PM, John Paulson wrote:
>
> Hi Enea and Roberto,
>
> Thank you very much for all of the work you have done on threading with
> PPL. I have a couple of questions about the current status of the thread
> safety. A project I am currently working on needs to have multiple threads
> working simultaneously on distinct PPL objects. Is that possible with the
> new thread safe version?
>
>
> Yes, it is possible.
>
> If the PPL objects are distinct, so that there is no concurrent access to
> the *same* object, then things should not be difficult (e.g., no need at
> all for synchronization). You can have a look to the ppl_lcdd and ppl_lpsol
> demos or to the recently added tests in tests/Polyhedron/threadsafe*.cc
>
>
> I made an implementation of my current project where I fork processes
> instead of using multiple threads. However, if I fork and have two
> processes manipulating distinct PPL objects, I do not get a speedup by a
> factor of two like I would expect. Is there anyway to get that type of
> speedup?
>
>
> If you use many process, then the PPL should be working fine "as is"
> (i.e., with thread-safety off).
> As for the missing speedup, I doubt it has something to do with the
> library:
> there should be something else going on, but I have no information to
> guess what it could be.
>
> Cheers,
> Enea.
>
>
>
> Best,
> John C. Paulson
>
>
> _______________________________________________
> PPL-devel mailing listPPL-devel at cs.unipr.ithttp://www.cs.unipr.it/mailman/listinfo/ppl-devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.cs.unipr.it/pipermail/ppl-devel/attachments/20161025/3fa5e307/attachment.htm>