SyncBarriers.jl

SyncBarriersModule

SyncBarriers

Dev

SyncBarriers.jl provides various implementations of barrier for shared memory synchronization and reductions in concurrent Julia programs. It respects the cooperative multitasking nature of Julia's task system while allowing the programmers to express and leverage the structure of the parallelism in their program.

See the documentation for more information.

Note: Appropriate insertion of barriers for correct and efficient parallel program is rather hard. For casual programming, it is recommended to ues higher-level data-parallel approaches.

A toy example

julia> using SyncBarriers

julia> xs = zeros(Bool, 20);

julia> xs[end÷2] = true;

julia> barrier = Barrier(length(xs) - 2);

julia> @sync for i in 2:length(xs)-1
           b = barrier[i-1]
           Threads.@spawn begin
               if i == 2
                   println()
                   join(stdout, (" █"[x + 1] for x in xs))
                   println()
               end
               for _ in 1:8
                   cycle!(b)               # wait for print
                   l, c, r = xs[i-1:i+1]   # (loading)
                   cycle!(b)               # wait for load
                   xs[i] = l ⊻ (c | r)     # (storing)
                   cycle!(b)               # wait for store
                   if i == 2
                       join(stdout, (" █"[x + 1] for x in xs))
                       println()
                   end
               end
           end
       end

         █
        ███
       ██  █
      ██ ████
     ██  █   █
    ██ ████ ███
   ██  █    █  █
  ██ ████  ██████
 ██  █   ███     █

See the benchmarks for examples with actual performance considerations.

source

Barrier factories

Barrier factories create a barrier with a given property without specifying the actual implementation. They use simple heuristics to determine an appropriate implementation.

Barrier constructors

SyncBarriers.DisseminationBarrierType
DisseminationBarrier(ntasks::Integer)

Create the dissemination barrier for ntasks tasks. It provides the best performance especially for large ntasks (⪆ 32).

Supported method: cycle!

source
SyncBarriers.StaticTreeBarrierType
StaticTreeBarrier{NArrive,NDepart}(ntasks::Integer)
StaticTreeBarrier{NArrive,NDepart,T}(op, ntasks::Integer)

Create the static tree barrier for ntasks tasks with the branching factor for arrival NArrive::Integer and departure NDepart::Integer specified by the type parameters.

It support fuzzy reduce barrier methods if the associative operations op and its domain T are given. Otherwise, it only supports fuzzy barrier methods.

It provides the best performance for large ntasks (⪆ 32) when reduction is needed.

Supported methods: cycle!, reduce!

source
SyncBarriers.TreeBarrierType
TreeBarrier{NBranches}(ntasks::Integer)
TreeBarrier{NBranches,T}(op, ntasks::Integer)

Create the tree barrier for ntasks tasks with the branching factor specified by the type parameter NBranches::Integer.

It support fuzzy reduce barrier methods if the associative operations op and its domain T are given. Otherwise, it only supports fuzzy barrier methods.

Supported methods: cycle!, arrive!, depart! reduce!, reduce_arrive!

source
SyncBarriers.FlatTreeBarrierType
FlatTreeBarrier{NBranches}(ntasks::Integer)
FlatTreeBarrier{NBranches,T}(op, ntasks::Integer)

Create the tree barrier for ntasks tasks with the branching factor specified by the type parameter NBranches::Integer. The departure is done serially (hence "flat").

It support fuzzy reduce barrier methods if the associative operations op and its domain T are given. Otherwise, it only supports fuzzy barrier methods.

Supported methods: cycle!, arrive!, depart! reduce!. reduce_arrive!.

source

Synchronizing operations

SyncBarriers.cycle!Function
cycle!(barrier[i])

Using a barrier::Barrier, signal that the i::Integer-th task has reached a certain phase of the program and wait for other tasks to reach the same phase.

Examples

julia> using SyncBarriers

julia> xs = [1:3;];

julia> barrier = Barrier(3);

julia> @sync for i in 1:3
           Threads.@spawn begin
               x = i^2
               xs[i] = x
               cycle!(barrier[i])
               xs[mod1(i + 1, 3)] -= x
           end
       end

julia> xs
3-element Vector{Int64}:
 -8
  3
  5
source
SyncBarriers.arrive!Function
arrive!(barrier[i])

Signal that the i::Integer-th task has reached a certain phase but postpone the synchronization for the departure.

A call to cycle! is equivalent to arrive! followed by depart!. However, the task calling arrive! can work on some other local computations before calling depart! which waits for other tasks to call arrive!.

Note that not all Barrier subtypes support arrive!.

See fuzzy_barrier, depart!.

Examples

julia> using SyncBarriers

julia> xs = [1:3;];

julia> ys = similar(xs);

julia> barrier = fuzzy_barrier(3);

julia> @sync for i in 1:3
           Threads.@spawn begin
               x = i^2
               xs[i] = x
               arrive!(barrier[i])  # does not `wait`
               ys[i] = x - 1  # do some work while waiting for other tasks
               depart!(barrier[i])  # ensure all tasks have reached `arrive!`
               xs[mod1(i + 1, 3)] -= x
           end
       end

julia> xs
3-element Vector{Int64}:
 -8
  3
  5

julia> ys
3-element Vector{Int64}:
 0
 3
 8
source
SyncBarriers.reduce!Function
reduce!(barrier[i], xᵢ::T) -> acc::T

Using a reduce barrier barrier (created, e.g., by reduce_barrier(⊗, T, n)), it computes acc = x₁ ⊗ x₂ ⊗ ⋯ ⊗ xₙ.

Examples

julia> using SyncBarriers

julia> xs = Float64[1:4;];

julia> barrier = reduce_barrier(+, Float64, length(xs));

julia> @sync for i in eachindex(xs)
           Threads.@spawn begin
               x = i^2
               s = reduce!(barrier[i], x)
               m = s / length(xs)
               xs[i] = x - m
           end
       end

julia> xs
4-element Vector{Float64}:
 -6.5
 -3.5
  1.5
  8.5
source
SyncBarriers.reduce_arrive!Function
reduce_arrive!(barrier[i], xᵢ::T)

Using a fuzzy reduce barrier barrier (created, e.g., by fuzzy_reduce_barrier(op, T, ntasks)), it initiates the reduction across tasks. The result of the reduction can be retrieved by a call to depart!(barrier[i]) once all tasks have called reduce_arrive!.

Examples

julia> using SyncBarriers

julia> xs = Float64[1:4;];

julia> ys = similar(xs);

julia> barrier = fuzzy_reduce_barrier(+, Float64, length(xs));

julia> @sync for i in eachindex(xs)
           Threads.@spawn begin
               x = i^2
               reduce_arrive!(barrier[i], x)
               ys[i] = x - 1
               s = depart!(barrier[i])
               m = s / length(xs)
               xs[i] = x - m
           end
       end

julia> xs
4-element Vector{Float64}:
 -6.5
 -3.5
  1.5
  8.5

julia> ys
4-element Vector{Float64}:
  0.0
  3.0
  8.0
 15.0
source