libmuscle.planner.resources module

Module for describing compute resources

There’s a huge comment here because there’s a big mess here that took me forever to figure out, so now I’m going to document it for the future.

Identifying hardware resources

Today’s computers all contain multi-core CPUs, often with symmetric multithreading (SMT), also known as hyperthreading. This means that we have hardware threads (hwthreads) and also cores, and then there’s caches and memory as well but we’re not going into NUMA here.

Cores and hwthreads are identified by number, but they have multiple different numbers that are referred to by different names in different contexts, making everything very confusing. So here are some definitions to disambiguate things. Note that this is still a rather simplified representation, but it’s enough for what we’re doing here in MUSCLE3.

Hardware threads

A hardware thread (hwthread) is, at the hardware level, an instruction decoder. It points to wherever in the code we are currently executing, and it can read the next couple of instructions and figure out how to execute them. It can’t actually execute anything however, because it doesn’t have the hardware that does that.

Intel refers to hwthreads as “logical processors” and so does Linux, hwloc calls them “processing units” or PUs and so does OpenMPI unless it uses the term hwthread just to confuse things a bit more.

Cores

A core contains at least one hwthread, and at least one functional unit, which is a hardware component that actually does calculations and other data processing. Within a core, the hwthread(s) read instructions and pass them to the functional units to be executed. If a core has more than one hwthread, then the CPU supports SMT.

Intel refers to cores as “physical processors”, hwloc calls them cores and so do most other sources. We’ll use cores here.

Since a hwthread cannot do anything on its own, it’s always part of a core.

CPUs

The term CPU is used in many ways by various bits of documentation, sometimes referring to a hwthread or a core, but here we’ll take it to mean a collection of cores in a plastic box. Similar terms are package (referring to that plastic box with very many metal pins) and socket (the thing the package mounts into), or processor, which was originally used to refer to all of the above when CPUs still had only one core with only one hwthread, and has now become ambiguous.

Weird things can happen here, I’ve seen CPUs that as far as I can tell are a single package, but nevertheless claim to have two sockets. I suspect that that’s two physical chips in a single plastic box, but I don’t know for sure.

Here, we’re concerned with hwthreads and cores and how to identify them and assign instances to them.

Linux

On modern operating systems, hardware access is mediated by the operating system, and we’re mainly concerned with Linux here because that is what all the clusters are running (see the note on macOS below). Information about the CPU(s) can be obtained on Linux from the /proc/cpuinfo file, or equivalently but more modernly, from the files in /sys/devices/system/cpu/cpu<x>/topology/.

Linux collects information about processors because it needs to run processes (programs, software threads) on them on behalf of the user. Processes are assigned to hwthreads, so that is what Linux considers a processor. /proc/cpuinfo lists all these processors, and they each have their own directory /sys/devices/system/cpu/cpu<x>.

On Linux, processors have an id, which is that number <x> in the directory, and is listed under “processor” in /proc/cpuinfo. Since this number identifies a hwthread and is assigned by Linux rather than being baked into the hardware, I’m calling it a “logical hwthread id”, this being a logical id of a hwthread, not an id of a logical hwthread. It’s also the id of a logical processor in Intel-speak.

Hwthreads actually have a second number associated with them, which does come from the hardware. In /proc/cpuinfo, that’s listed under “apicid”; it doesn’t seem to be available from sysfs. Hwloc call this the “physical PU (its name for a hwthread) id”, and OpenMPI’s mpirun manpage also refers to it as a “physical processor location”.

There’s great potential for confusion here: the “physical PU id” and “physical processor location” both identify a hardware-specified number (a physical id or a physical location) for a hwthread. This is something completely different than what Intel calls a “physical processor”, which they use to refer to a core.

MUSCLE3 uses logical hwthread ids everywhere, it does not use physical ids.

Linux knows about how hwthreads are grouped into bigger things of course. Cores are identified in Linux using the “core id”, which is listed in /proc/cpuinfo and in /sys/devices/system/cpu/cpu<x>/topology/core_id. So for each hwthread, identified by its logical id, we can look up which core it is a part of. The core id is a logical id, assigned by Linux, not by the hardware. While logical hwthread ids seem to always be consecutive at least on the hardware I’ve seen so far, core ids may have gaps.

MUSCLE3 does not use core ids, although it uses groups of hwthread ids that contain all the hwthreads for a given core.

Resource binding

Running processes need something to run on, a hwthread. The assignment of process to hwthread is done by the operating system’s scheduler: when a process is ready to run, the scheduler will try to find it a free hwthread to run on.

The scheduler can be constrained in which hwthreads it considers for a given process, which is known as binding the process. This may have performance benefits, because moving a process from one hwthread to another takes time. In MUSCLE3, when running on a cluster, each process is assigned its own specific set of hwthreads to run on, and we try to bind the instance to the assigned hwthreads.

Taskset

How this is done depends on how the instance is started. For non-MPI instances, we use a Linux utility named ‘taskset’ that starts another program with a giving binding. The binding is expressed as an affinity mask, a string of bits that say whether a given processor (hwthread) can be used by the process or not. Each position in the string of bits corresponds to the hwthread with that logical id.

OpenMPI

OpenMPI can bind cores in various ways, we use a rankfile and the –use-hwthread-cpus option to specify the logical hwthread ids we want to bind each MPI process (rank) to. Note that OpenMPI by default binds to cores, and can also bind to various other things including sockets.

MPICH

MPICH doesn’t support binding, as far as I can see.

Intel MPI

Intel MPI uses logical hwthread ids-based masks, specified in an environment variable, to go with a machinefile that lists the nodes to put each process on.

Slurm srun

Slurm’s srun has a CPU_BIND environment variable that likewise contains logical hwthread ids-based masks, and a hostfile that lists the nodes to put each process on.

Here are some disambiguation tables to help with the confusion:

``` MUSCLE3 hwthread logical hwthread id physical hwthread id

Linux processor processor apicid

(/proc/cpuinfo only)

cgroups always uses these

taskset always uses these

hwloc PU PU L#<x> PU P#<x>

OpenMPI hwthread used in rankfile if used in rankfile if
--use-hwthread-cpus

rmaps_rank_file_physical

is specified MCA param set

Intel logical logical processor

processor number

srun used by –bind-to

psutil logical returned by Process.cpu_affinity()

core counted by psutil.cpu_count(logical=True)

```

``` MUSCLE3 core core id

Linux core core id

Hwloc core core L#<x>

OpenMPI core used in rankfile if

–use-hwthread-cpus not specified

psutil physical counted by psutil.cpu_count(logical=False)

core

```

class libmuscle.planner.resources.Core(cid: int, hwthreads: Set[int])[source]

Bases: object

Describes a CPU core or designates a core or one or more hwthreads.

A core is a group of functional units with one or more instruction decoders. If the core supports symmetric multithreading (SMT, aka hyperthreading) then there will be more than one instruction decoder or hardware thread in the core.

Note that the term “logical CPU” refers to an instruction decoder/hwthread. If the processor does not support SMT, then each core has a single decoder and so a logical CPU is also a core.

This class can be used in different ways with slighly different interpretations. When describing hardware resources, it describes a core and all of its hwthreads. In this case, cid is the core id, and hwthreads contains the hwthread ids of all hwthreads on this core. If no SMT is supported, then there will be only one hwthread id.

When designating a whole core (e.g. for use by a process), cid is set to the id of the core, and hwthreads contains all of the hwthreads on that core. When designating a hwthread on a particular core, cid is set to the id of the core and hwthreads contains the designated (single) hwthread.

MUSCLE3 never assigns swthreads to subsets of hwthreads on a core, it assigns them to either a single hwthread or a single whole core. So if more than one hwthread is given, then we can assume that those are all the hwthreads on that core.

Objects of this class automatically deepcopy when copied. This means that you can make a copy using copy.copy() and modify that copy anywhere without changing the original.

Parameters:
  • cid – ID of this core, to be used to refer to it

  • hwthreads – Ids of hwthreads (logical CPUs) belonging to this core

isdisjoint(other: Core) bool[source]

Returns whether we share resources with other.

class libmuscle.planner.resources.CoreSet(cores: Iterable[Core])[source]

Bases: object

A set of cores on a single node.

This exists to make it a bit easier to operate on sets of cores, merging and subtracting them.

Objects of this class automatically deepcopy when copied. This means that you can make a copy using copy.copy() and modify that copy anywhere without changing the original.

get_first_cores(num_cores: int) CoreSet[source]

Returns the first num_cores cores in this set.

Parameters:

select. (The number of cores to)

isdisjoint(other: CoreSet) bool[source]

Returns whether we share resources with other.

class libmuscle.planner.resources.OnNodeResources(node_name: str, cpu_cores: CoreSet)[source]

Bases: object

Resources on a single node, currently only CPU cores.

This represents a set of resources on a single node, either all of the resources available or some subset of interest.

Objects of this class automatically deepcopy when copied. This means that you can make a copy using copy.copy() and modify that copy anywhere without changing the original.

hwthreads() Iterable[int][source]

Return the hwthreads in this node.

isdisjoint(other: OnNodeResources) bool[source]

Returns whether we share resources with other.

total_cores() int[source]

Return the number of CPU cores in this node.

class libmuscle.planner.resources.Resources(nodes: Iterable[OnNodeResources] | None = None)[source]

Bases: object

Designates a (sub)set of resources.

Whether these resources are free or allocated in general or by something specific depends on the context, this just says which resources we’re talking about.

Objects of this class automatically deepcopy when copied. This means that you can make a copy using copy.copy() and modify that copy anywhere without changing the original.

nodes[source]

A collection of nodes to include in this resource set

add_node(node_res: OnNodeResources) None[source]

Add a node’s resources.

This absorbs node_res into this Resources object, so if you change node_res after adding it, the changes will be reflected in this Resources.

Parameters:

node_res – Resources on a node not yet included in this Resources.

Raises:

RuntimeError – if we already have a node with this node name.

cores() Iterable[Tuple[str, int]][source]

Return this resources as a list of node, core.

hwthreads() Iterable[Tuple[str, int]][source]

Return this resources as a list of node, hwthread.

isdisjoint(other: Resources) bool[source]

Return whether we share resources with other.

merge_node(node_res: OnNodeResources) None[source]

Merges a node’s resources

This always copies the object.

Parameters:

node_res – Resources on a node that may already be included in this Resources.

nodes() Iterable[str][source]

Return the names of the nodes on which we designate resources.

total_cores() int[source]

Return the total number of cores (not hwthreads) designated.

static union(resources: Iterable[Resources]) Resources[source]

Combines the resources into one.

Parameters:

resources – A collection of resources to merge.

Returns:

A Resources object referring to all the resources in the input.