It's important to note that the big goal, where we hope to achieve large performance gains, is on systems with more than a hundred objects. We expect this because of results obtained from an earlier version of a 2D Impulse-type dynamic simulation system. In fact, our system mimics the results we obtained on the 2D system for small particle systems. However, due to memory limitations, we have not been able to achieve timing runs on such significantly large systems. We expect that soon we will solve the system's greedy memory usage problem, and will then be able to test our simulator on these large systems. Stay tuned. For now we report our results on some smaller systems where we don't expect large performance gains. |
Also note that although we desired to run simulations with large numbers of objects, the memory constraints of the CM5 prevented us from running with more than a few dozen objects. More impressive results of the speedup would be apparent on larger simulations when the more parallelized portions of the code play a larger roll. We also desire to run longer simulations. However, due to memory restrictions as the simulation proceeds we have truncated many of the simulations to only a few seconds of simulation time.
Finally, we are also slowed down by the rendering requirement. If we could simply advance from collision to collision we would be able to simulate the various systems much more quickly. However, we must artificially halt the system and take a picture at evenly spaced intervals even if nothing "interesting" has happened. We do this so that we can produce realistic movies with evenly spaced frames. If only some final result was desired from the simulation, then we could relax this frame rate and achieve improved performance.
Overall, the results are quite poor. Mostly this is due the fundamental degree of difficulty in parallelizing such a complicated system. Clearly, there exists a bottleneck in the collision step and heap updating. However, we are looking for big parallel speedup wins for large systems with many objects. In a previous parallel version of a 2D simulator with many of the same data structures we failed to observe real performance gains until a much larger number of objects were simulated. We expect similar results in the full 3D dynamic simulator. In fact we mimic the 2D speedup graph almost exactly. As shown in the graph below, there is no significant performance gain until systems with over 100 objects. This is promising for future work on the parallel simulator.
Processors | Simulation Time (sec) | Processing Time (sec) | Slowdown from Realtime | Speedup |
1 | 1.66 | 27.7 | 16.7 | 1.00 |
2 | 1.66 | 38.5 | 23.1 | 0.72 |
4 | 1.66 | 20.3 | 12.1 | 1.36 |
8 | 1.66 | 17.9 | 10.7 | 1.55 |
Processors | Simulation Time (sec) | Processing Time (sec) | Slowdown from Realtime | Speedup |
1 | 1.66 | 28.9 | 17.3 | 1.00 |
2 | 1.66 | 16.5 | 9.9 | 1.75 |
4 | 1.66 | 19.9 | 11.9 | 1.45 |
8 | 1.66 | 16.1 | 9.6 | 1.80 |
Processors | Simulation Time (sec) | Processing Time (sec) | Slowdown from Realtime | Speedup |
1 | 0.55 | 12.53 | 22.6 | 1.00 |
2 | 0.55 | 10.5 | 19.0 | 1.2 |
4 | 0.55 | 9.1 | 16.5 | 1.4 |
8 | 0.55 | 9.93 | 17.8 | 1.3 |
16 | 0.55 | 10.6 | 19.1 | 1.2 |
Processors | Simulation Time (sec) | Processing Time (sec) | Slowdown from Realtime | Speedup |
1 | 0.55 | 17.7 | 31.2 | 1.00 |
2 | 0.55 | 50.3 | 90.6 | 0.35 |
4 | 0.55 | 12.93 | 23.2 | 1.37 |
8 | 0.55 | 10.56 | 19.0 | 1.68 |
16 | 0.55 | 10.73 | 19.3 | 1.64 |