-
Notifications
You must be signed in to change notification settings - Fork 35
Add support for topology-aware scheduling #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
fc48709
Added topology aware scheduled for compute VMs
wtripp180901 2d80af3
compute nodes must now be marked as available for topo-aware scheduling
wtripp180901 3c9d509
Now allows AZ to be specified for non-BM instances
wtripp180901 f1d8fb0
refactor + added to CI groups
wtripp180901 8d63108
Merge branch 'main' into feat/topology-aware-scheduling
wtripp180901 96c13de
typo
wtripp180901 28457a7
added readme
wtripp180901 b00188e
docs updates + review suggestions + refactor template override
wtripp180901 80ce744
add top level topology override + gate plugin on group being enabled
wtripp180901 68d55f4
typos + renames + added reconfigure warning to docs
wtripp180901 025827e
set changed false + comments
wtripp180901 cb9b273
Merge branch 'main' into feat/topology-aware-scheduling
wtripp180901 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -96,3 +96,5 @@ roles/* | |
!roles/nhc/** | ||
!roles/eessi/ | ||
!roles/eessi/** | ||
!roles/topology/ | ||
!roles/topology/** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
topology | ||
======== | ||
|
||
Templates out /etc/slurm/topology.conf file based on an OpenStack project for use by | ||
Slurm's [topology/tree plugin.](https://slurm.schedmd.com/topology.html) Models | ||
cluster as tree with a hierarchy of: | ||
|
||
Top-level inter-rack Switch -> Availability Zones -> Hypervisors -> VMs | ||
|
||
Warning: This role doesn't currently trigger a restart of Slurm so will therefore not | ||
reconfigure an already running cluster after a `ansible/site.yml` run. You will therefore need | ||
to run the `ansible/adhoc/restart-slurm.yml` playbook for changes to topology.conf to be | ||
recognised. | ||
|
||
Role Variables | ||
-------------- | ||
|
||
- `topology_nodes:`: Required list of strs. List of inventory hostnames of nodes to include in topology tree. Must be set to include all compute nodes in Slurm cluster. Default `[]`. | ||
- `topology_conf_template`: Optional str. Path to Jinja2 template of topology.conf file. Default | ||
`templates/topology.conf.j2` | ||
- `topology_above_rack_topology`: Optionally multiline str. Used to define topology above racks/AZs if | ||
you wish to partition racks further under different logical switches. New switches above should be | ||
defined as [SwitchName lines](https://slurm.schedmd.com/topology.html#hierarchical) referencing | ||
rack Availability Zones under that switch in their `Switches fields`. These switches must themselves | ||
be under a top level switch. e.g | ||
``` | ||
topology_above_rack_topology: | | ||
SwitchName=rack-group-1 Switches=rack-az-1,rack-az-2 | ||
SwitchName=rack-group-2 Switches=rack-az-3,rack-az-4 | ||
SwitchName=top-level Switches=rack-group-1,rack-group-2 | ||
``` | ||
Defaults to an empty string, which causes all AZs to be put under a | ||
single top level switch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Nodes to be included in topology tree, must include all Slurm compute nodes | ||
topology_nodes: [] | ||
|
||
# Override to use custom topology.conf template | ||
topology_conf_template: templates/topology.conf.j2 | ||
|
||
topology_above_rack_topology: "" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
#!/usr/bin/python | ||
|
||
# Copyright: (c) 2025, StackHPC | ||
# Apache 2 License | ||
|
||
from ansible.module_utils.basic import AnsibleModule | ||
import openstack | ||
|
||
DOCUMENTATION = """ | ||
--- | ||
module: map_hosts | ||
short_description: Creates map of OpenStack VM network topology | ||
description: | ||
- Creates map representing the network topology tree of an OpenStack project with a heirarchy | ||
of: Availability Zone -> Hypervisors -> VMs/Baremetal instances | ||
options: | ||
compute_vms: | ||
description: | ||
- List of VM names within the target OpenStack project to include in the tree | ||
required: true | ||
type: str | ||
author: | ||
- Steve Brasier, William Tripp, StackHPC | ||
""" | ||
|
||
RETURN = """ | ||
topology: | ||
description: | ||
Map representing tree of project topology. Top level keys are AZ names, their values | ||
are maps of shortened unique identifiers of hosts UUIDs to lists of VM names | ||
returned: success | ||
type: dict[str, dict[str,list[str]]] | ||
sample: | ||
"nova-az": | ||
"afe9": | ||
- "mycluster-compute-0" | ||
- "mycluster-compute-1" | ||
"00f9": | ||
- "mycluster-compute-vm-on-other-hypervisor" | ||
""" | ||
|
||
EXAMPLES = """ | ||
- name: Get topology map | ||
map_hosts: | ||
compute_vms: | ||
- mycluster-compute-0 | ||
- mycluster-compute-1 | ||
""" | ||
|
||
def min_prefix(uuids, start=4): | ||
""" Take a list of uuids and return the smallest length >= start which keeps them unique """ | ||
for length in range(start, len(uuids[0])): | ||
prefixes = set(uuid[:length] for uuid in uuids) | ||
if len(prefixes) == len(uuids): | ||
return length | ||
|
||
def run_module(): | ||
module_args = dict( | ||
compute_vms=dict(type='list', elements='str', required=True) | ||
) | ||
module = AnsibleModule(argument_spec=module_args, supports_check_mode=True) | ||
|
||
conn = openstack.connection.from_config() | ||
|
||
servers = [s for s in conn.compute.servers() if s["name"] in module.params["compute_vms"]] | ||
|
||
topo = {} | ||
all_host_ids = [] | ||
for s in servers: | ||
az = s['availability_zone'] | ||
host_id = s['host_id'] | ||
if host_id != '': # empty string if e.g. server is shelved | ||
all_host_ids.append(host_id) | ||
if az not in topo: | ||
topo[az] = {} | ||
if host_id not in topo[az]: | ||
topo[az][host_id] = [] | ||
topo[az][host_id].append(s['name']) | ||
|
||
uuid_len = min_prefix(list(set(all_host_ids))) | ||
|
||
for az in topo: | ||
topo[az] = dict((k[:uuid_len], v) for (k, v) in topo[az].items()) | ||
|
||
result = { | ||
"changed": False, | ||
"topology": topo, | ||
} | ||
|
||
module.exit_json(**result) | ||
|
||
|
||
def main(): | ||
run_module() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
- name: Map instances to hosts | ||
become: false | ||
map_hosts: | ||
compute_vms: "{{ topology_nodes }}" | ||
register: _topology | ||
delegate_to: localhost | ||
run_once: true | ||
|
||
- name: Template topology.conf | ||
become: true | ||
ansible.builtin.template: | ||
src: "{{ topology_conf_template }}" | ||
dest: /etc/slurm/topology.conf | ||
owner: root | ||
group: root | ||
mode: 0644 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# topology.conf | ||
# Switch Configuration | ||
{% for az in _topology.topology.keys() %} | ||
{% for instance_host in _topology.topology[az].keys() %} | ||
SwitchName={{ instance_host }} Nodes={{ _topology.topology[az][instance_host] | join(",") }} | ||
{% endfor %} | ||
SwitchName={{ az }} Switches={{ _topology.topology[az].keys() | join(",") }} | ||
{% endfor %} | ||
{% if topology_above_rack_topology == '' %} | ||
SwitchName=master Switches={{ _topology.topology.keys() | join(",") }} | ||
wtripp180901 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{% else %} | ||
{{ topology_above_rack_topology }} | ||
{% endif %} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
topology_nodes: "{{ groups['topology'] }}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.