-
Orphan1 (source):
main program: "time loop", not parallel
loops in smooth_x/smooth_y can be parallelised
large overhead: fork/join in every call
-
Orphan2 (source):
create threads once with
PARALLEL block around time loop
-
all threads execute this loops
-
index variable t private by default
DO in subroutines for distribution of loop iterations
-
not lexically inside a PARALLEL
directive ("orphaned")
-
already created threads are used
-
synchronisation of threads (implicit barrier) at END DO
-
Orphan3 (source):
END DO NOWAIT : no barrier at the end of
DO
can be used to reduce thread waiting times
Warning: erroneous in our example (even if used only for one loop)!
explicit synchronisation: BARRIER
-
Orphan4 (source):
commands executed by only one thread:
all threads wait at the end
simultaneous prints by several threads ok, but output may be scrambled