libmuscle.manager.profile_database module

class libmuscle.manager.profile_database.ProfileDatabase(db_file: str | Path)[source]

Bases: object

Accesses a profiling database.

This class accesses a MUSCLE3 profiling database and provides basic analysis functionality.

Close the connection to the database.

This should be called once by each thread that used this object before it goes down, so that its connection to the database can be closed.

It is usually better to use this class as a context manager, e.g.

with ProfileDatabase('performance.sqlite') as db:
    # use db here

instance_stats() → Tuple[List[str], List[float], List[float], List[float]][source]

Calculate per-instance statistics.

This calculates the total time spent computing, the total time spent communicating, and the total time spent waiting for a message to arrive, for each instance, in seconds.

It returns a tuple of four lists, containing instance names, run times, communication times, and wait times. Note that the run times do not include data transfer or waiting for messages, so these three are exclusive and add up to the total time the instance was active.

Note that sending messages in MUSCLE3 is partially done in the background. Transfer times include encoding and queueing of any sent messages as well as downloading received messages, but it does not include the sending side of the transfer, as this is done by a background thread in parallel with other work that is recorded (usually waiting, sometimes computing or sending another message).

Nevertheless, this should give you an idea of which instances use the most resources. Keep in mind though that different instances may use different numbers of cores, so a model that doesn’t spend much wallclock time may still spend many core hours. Secondly, waiting does not necessarily leave a core idle as MUSCLE3 is capable of detecting when models will not compute at the same time and having them share cores.

See the profiling documentation page for an example.

Returns:: A list of instance names, a corresponding list of compute times, a corresponding list of transfer times, and a corresponding list of wait times.

resource_stats() → Dict[str, Dict[str, float]][source]

Calculate per-core statistics.

This function calculates the amount of time each core has spent running each component assigned to it. It returns a dictionary indexed by (node, core) tuples which contains for each core a nested dictionary mapping components to the number of seconds that component used that core for. This includes time spent calculating and time spent receiving data, but not time spent waiting for input.

Returns:: A dictionary containing for each (node, core) tuple a dictionary containing for each component the total time it ran on that core.

Calculate time of and between events.

This function returns the mean or total time spent on or between selected points in time recorded in the database, in nanoseconds. Note that due to operating system limitations, actual precision for individual measurements is limited to about a microsecond.

For profiling purposes, an event is an operation performed by one of the instances in the simulation. It has a type, a start time, and a stop time. For example, when an instance sends a message, this is recorded as an event of type SEND, with associated timestamps. For some events, other information may also be recorded, such as the port and slot a message was sent or received on, message size, and so on.

This function takes two points in time, each of which is the beginning or the end of a certain kind of event, and calculates the time between those two points. For example, to calculate how long it takes instance micro to send a message on port final_state, you can do

db.time_taken(
        etype='SEND', instance='micro', port='final_state')

This selects events of type SEND, as well as an instance and a port, and since we didn’t specify anything else, we get the time taken from the beginning to the end of the selected event. The micro model is likely to have sent many messages, and this function will automatically calculate the mean duration. So this tells us on average how long it takes micro to send a message on final_state.

Averaging will be done over all attributes that are not specified, so for example if final_state is a vector port, then the average will be taken over all sends on all slots, unless a specific slot is specified by a slot argument.

It is also possible to calculate time between different events. For example, if we know that micro receives on initial_state, does some calculations, and then sends on state_out, and we want to know how long the calculations take, then we can use

db.time_taken(
        instance='micro', port='initial_state',
        etype='RECEIVE', time='stop',
        port2='final_state', etype2='SEND',
        time2='start')

This gives the time between the end of a receive on initial_state and the start of a subsequent send on final_state. The arguments with a 2 at the end of their name refer to the end of the period we’re measuring, and by default their value is taken from the corresponding start argument. So, the first command is actually equivalent to

db.time_taken(
    etype='SEND', instance='micro', port='final_state',
    slot=None, time='start', etype2='SEND',
    port2='final_state', slot2=None, time2='stop')

which says that we measure the time from the start of each send by micro on final_state to the end of each send on final_state, aggregating over all slots if applicable.

Speaking of aggregation, there is a final argument aggregate which defaults to mean, but can be set to sum to calculate the sum instead. For example:

db.time_taken(
        etype='RECEIVE_WAIT', instance='macro',
        port='state_in', aggregate='sum')

gives the total time that macro has spent waiting for a message to arrive on its state_in port.

If you are taking points in time from different events (e.g. different instances, ports, slots or types) then there must be the same number of events in the database for the start and end event. So starting at the end of REGISTER and stopping at the beginning of a SEND on an O_I port will likely not work, because the instance only registers once and probably sends more than once.

Parameters:

etype – Type of event to get the starting point from. Possible values: ‘REGISTER’, ‘CONNECT’, ‘SHUTDOWN_WAIT’, ‘DISCONNECT_WAIT’, ‘SHUTDOWN’, ‘DEREGISTER’, ‘SEND’, ‘RECEIVE’, ‘RECEIVE_WAIT’, ‘RECEIVE_TRANSFER’, ‘RECEIVE_DECODE’. See the documentation for a description of each.
instance – Name of the instance to get the event from. You can use % as a wildcard matching anything. For example, ‘macro[%’ will match all instances of the macro component, if it has many.
port – Selected port, for send and receive events.
slot – Selected slot, for send and receive events.
time – Either ‘start’ or ‘stop’, to select the beginning or the end of the specified event. Defaults to ‘start’.
etype2 – Type of event to get the stopping point from. See etype. Defaults to the value of etype.
port2 – Selected port. See port. Defaults to the value of port.
slot2 – Selected slot. See slot. Defaults to the value of slot.
time2 – Either ‘start’ or ‘stop’, to select the beginning or the end of the specified event. Defaults to ‘stop’.
aggregate – Either ‘mean’ (default) or ‘sum’, to calculate that statistic.

Returns:

The mean or total time taken in nanoseconds.