Bulk synchronous parallel Dev

The BSP is a model to design parallel algorithms for large-scale graph processing. A BSP algorithm “thinks like an asset”, where the code has a local view of the graph, restricted to the asset and its relations.

Introduction

With a BSP algorithm, you can do the following:

change the vertex and its outgoings (parent->child relations)
changing another asset means sending a message to that asset so it can process the message and change its data

A BSP algorithm runs inside the BSP main loop. Each iteration is performing a superstep, which consists of three components:

compute - where concurrent computation occurs on the assets, via multiple processors. An asset is accessed by a single thread, hence the BSP framework ensures that computation on an asset is actually thread-safe;
communication - where messages (specific to the algorithm) are sent to neighbor assets in order to pass information or activate them;
barrier synchronization - processors wait for each other. Additionally, assets are saved in a single Atomic.

BSP superstep
A superstep of BSP

In the current implementation, communication is only internal and does not involve the network. In that sense, all three components of a superstep happen sequentially. This will change in future implementations when the BSP will be enhanced with a distributed setting, where several machines may be involved. However, this change will be transparent to the algorithms: they will not need to be refactored.

Finally, during the barrier synchronization, the BSP also has a mechanism to detect the concurrent modification of assets (e.g. by other parts of the system) and allows the algorithm to take action. This part is covered in section Creating a BSP algorithm.

Example

This section illustrates the BSP concepts around a small example. Suppose we have the following BSP algorithm in pseudo code, which adds an increasing number to the asset names:

Messages for the algorithm: integer
 
Compute Method (i.e. what happens in 'compute' of a superstep):
{
   num := value from received message (0 if none)
   append ‘num’ to asset name
   send ‘num+1’ to all children (outgoings)
   halt (i.e. don't be active for the next supersteps).
}

Finally, suppose the algorithm applies to the following asset structure, starting from asset ‘A1’:

BSP superstep

Superstep 0

The BSP starts at superstep 0, with computation on asset ‘A1’. This is divided in:

compute: Only one thread is started, and runs the computation on A1 as follow: as the asset has no messages in its queue, the number added to its name is simply ‘0’. Then the asset sends the next value, ‘1’, to its children: A11, A12, A13. A1 is then ‘halted’;
communication: The message ‘1’ is actually sent to A11, A12, A13;
barrier synchronization: The framework waits for the reception of all messages, then saves the asset A1.

The BSP increases the superstep, and automatically activates assets that received a message. As some assets are active, the BSP starts the next iteration.

BSP superstep

Superstep 1

The following assets have some messages in their queues, hence they become active during this superstep: A11, A12, A13. The superstep is divided in:

compute: Three threads are started, one for each asset (A11, A12, A13). Their computation is as follow: retrieve the number ‘1’ from the received message and add it to the asset name. For asset A11, the next value ‘2’ is sent to its child A111. For A12 and A13, no messages are sent as those assets have no children. Assets A11, A12, A13 are then ‘halted’;
communication: The message ‘2’ is actually sent to A111;
barrier synchronization: The framework waits for the reception of all messages, then saves the assets A11, A12 and A13.

The BSP increases the superstep, and automatically activates the assets that received a message. As some assets are active, the BSP starts the next iteration.

BSP superstep

Superstep 2

Only asset A111 has a message in its queue, hence it is the only asset that is active for this superstep. The superstep is divided in:

compute: Only one thread is started, and runs the computation on A111 as follow: retrieve the number ‘2’ from the message received, and add it to the asset name. No messages are sent as A111 has no children. A111 is then ‘halted’;
communication: Nothing to do;
barrier synchronization: The framework saves the asset A111.

In this superstep, no message has been sent, which means no asset has been automatically activated. As all assets are ‘halted’, the execution stops.

BSP superstep

BSP and Breadth-First Search

As shown in the example above, the BSP computational model is walking the graph of assets in a Breadth-First Search (wiki:BFS) manner, meaning that the execution is first performed on all neighbors before continuing to the next level.
This would probably be different if an algorithm was implemented without the BSP. It would walk the graph recursively in a Depth-First Search manner (wiki:DFS), where a neighbor is explored as far as possible before backtracking and continue on the next neighbor.

Page tree

Bulk synchronous parallel Dev

Introduction

Example

Superstep 0

Superstep 1

Superstep 2

BSP and Breadth-First Search