-
Notifications
You must be signed in to change notification settings - Fork 44
Slurm: total_nodes probably shouldn't account for down/unavailable nodes #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Total nodes shouldn't account for nodes marked as down in Slurm, as this might look misleading in certain widgets like System Status. Have also broken out unavailable nodes as a new property if directly modifying total_nodes isn't considered ideal.
|
Thanks for the contribution. I think on the surface it makes sense, I just want a second to test it out to see what's what. I don't think accessing [] from a regex output value is super safe, but I'll have to check it out (besides that's kinda what we were doing before). Some casts |
|
Added some |
| node_cpu_info = call("sinfo", "-aho %F/%D/%C").strip.split('/') | ||
| gres_length = call("sinfo", "-o %G").lines.map(&:strip).map(&:length).max + 2 | ||
| gres_lines = call("sinfo", "-ahNO ,nodehost,gres:#{gres_length},gresused:#{gres_length},statelong") | ||
| .lines.uniq.reject { |line| line.match?(/maint|drain/i) }.map(&:split) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently g is not valid for ruby regular expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be sorted now, committed fix
|
OK thanks for this addition. I'm working to fix it up a bit and add a test case for the same in a subsequent PR. |
johrstrom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This addition will be in 4.1, coming out shortly.
Nodes that are marked as unavailable/down in Slurm, along with their processors, shouldn't be counted in the
total_nodesandtotal_processorsfigure, as this leads to misleading numbers in certain areas such as the System Status, which presents more resources available than there really are.Have also added "down" (since down nodes are also inaccessible to the user) and changed the string match on
gres_linesdone in #916 to/igto account for multiple nodes being in one of those states