This article explains how to check statistics and stop a running algorithm. Both actions are available in the censhare Admin client.

Introduction

The BSP actions are available in the censhare Admin client. If you execute an action, a new window opens that displays three sections. An XML representation of the statistics is output in the logs.

Admin action BSP Statistics

Global statistics

The first part of the BSP statistics is global statistics for all past executions since the start of the server. Such statistics help to monitor the health of the system and find potential bottlenecks that need to be fixed.

The following statistics are presented:

  • fill-exec-queue: fill rate (in %) of the execution queue (for executing the compute method). If the queue starts to reach 200% or more too often, it should be increased;
  • nb-assets-single-superstep: number of assets the BSP worked in a single superstep. If this number starts to reach 7-8000 assets, it means some partitioning needs to be implemented, as the system is not able to handle 10000 assets in a single superstep;
  • load-assets-single-superstep, save-assets-single-superstep, reload-assets-single-superstep: time (in ms) the BSP spent in a single superstep to respectively load, save and reload the assets. If this time starts to become too high, some refactoring is required (probably in the AssetManagement);
  • compute-single-superstep: time (in ms) the BSP spent to execute all compute methods in a single superstep. A high number is due to either a queue that is not properly sized (then check the fill-exec-queue), a large number of assets (check nb-assets-single-superstep) or a compute method that takes too long (check the statistics of ‘single runs’ below);
  • time-other-single-superstep: time for other parts of a single superstep (e.g. checking outdated assets, pre-post supersteps, etc.). This number should be quite low;
  • total-exec-time total execution time (in ms) of the algorithms. An algorithm can run for a long time if the asset structure is very complex. Check the statistics of ‘single runs’ below if this number is high;
  • total-exec-time-per-asset total executing time divided by the number of assets that were visited. This number should be quite low. A high number probably means an asset goes into ‘compute’ too many times during a single execution, but other reasons are also possible. You should check the statistics of ‘single runs’ below;
  • nb-assets-loaded, ns-assets-saved provide an idea how busy the BSP is during an execution. It is often to be related to other statistics: high execution times while a low number of assets are loaded/saved gives a hint there are some potential problems.

Single runs

The next part of the BSP statistics shows the executions that took most resources since the start of the server. This usually means a high total execution time, but also a compute time for one asset that was longest.

The table allows identifying algorithms that may need some refactoring. The most interesting statistics in this table are:

  • max single compute time the maximum time it took for a single call to compute on a single asset. A high number (i.e. 100s of ms) means the compute method is too complex and would benefit some refactoring.
  • nb asset load/save/visit: number of times the BSP loaded, saved and asset, and how many times the compute method was run (an asset was ‘visited’ by the method). Those numbers could give a hint why the execution time was high for this running instance of the algorithm.
  • max nb saves for 1 asset: this number should be very low (ideally 1 to 3). A high number means the algorithm should be redesigned in order to limit the saves. This can be done by gathering all necessary information in the vertex’s data, and then in a later superstep save this data into the asset, hence saving it only once during the execution;
  • max nb visits for 1 asset: this number should be low (ideally below 10). A high number means the algorithm is visiting the asset too many times; this probably means the assets are exchanging too many messages with each other, or the associated vertex is not halted.
  • max nb assets in 1 superstep: while the BSP can handle high numbers, it might be good to check if the algorithm could be refactored to avoid working on too many assets at the same time. Of course, this is not always possible, for instance with very big asset structures where some assets have thousands of children.

Stop an algorithm

The BSP is able to detect potential infinite executions of a BSP algorithm, or at least executions that take much longer than usual (it is not possible to detect if an algorithm will stop or not, see the Wikipedia article Halting Problem).

Stopping a BSP algorithm

The table, at the bottom of the ‘BSP Statistics’ window, shows the algorithms that were running while the ‘BSP Statistics’ action was triggered. It provides:

  • an algorithm ID, used in a selector at the bottom in order to choose the algorithm to stop;
  • the algorithms’ class;
  • the current execution time, both in milliseconds and in % of the longest known run)
  • the number of assets loaded so far by the algorithm. You may want to trigger the ‘BSP Statistics’ several times to check how an execution is behaving. If the number of assets keeps increasing, it could mean the algorithm is stuck in a loop that creates assets all the time;
  • the number of supersteps, which can help to see if this number is very high compared to the longest run and if it keeps increasing dramatically.

If you decide to stop an algorithm, select it (via its ID) in the selector, provide a reason for stopping the algorithm and click ‘OK’. The algorithm will fail by throwing a CancellationException with the reason you gave as a message.