Skip to content

Commit 2b993a8

Browse files
yangw-devpobin6
authored andcommitted
[Utilization Monitor] input to disable utilization monitor (pytorch#140857)
# Overview Currently monitor.py produces error only result, this pr introduct disable-monitor option to all *-test.yml. We also like to explore how the monitor code affect benchmark results. # next steps - fix the monitor.py - enable non-benchmark tests with monitor - investigate benchmark test behavior with monitor background job Pull Request resolved: pytorch#140857 Approved by: https://github.com/huydhn
1 parent b54eefe commit 2b993a8

File tree

6 files changed

+60
-7
lines changed

6 files changed

+60
-7
lines changed

.github/actions/linux-test/action.yml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,14 @@ inputs:
4747
GITHUB_TOKEN:
4848
description: GitHub token
4949
required: true
50-
50+
disable-monitor:
51+
description: |
52+
[Experimental] Disable utilization monitoring for tests.
53+
Currently, by default we disable the monitor job and only look for specific tests,
54+
since we are investigating the behaviour of the monitor script with different tests.
55+
required: false
56+
type: boolean
57+
default: true
5158
#env:
5259
# GIT_DEFAULT_BRANCH: ${{ inputs.default_branch }}
5360

@@ -115,6 +122,7 @@ runs:
115122

116123
- name: Start monitoring script
117124
id: monitor-script
125+
if: ${{ !inputs.disable-monitor }}
118126
shell: bash
119127
continue-on-error: true
120128
run: |
@@ -289,7 +297,7 @@ runs:
289297
cat test/**/*_toprint.log || true
290298
291299
- name: Stop monitoring script
292-
if: always() && steps.monitor-script.outputs.monitor-script-pid
300+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
293301
shell: bash
294302
continue-on-error: true
295303
env:

.github/workflows/_linux-test.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,14 @@ on:
4747
required: false
4848
type: string
4949
default: ""
50+
disable-monitor:
51+
description: |
52+
[Experimental] Disable utilization monitoring for tests.
53+
Currently, by default we disable the monitor job and only look for specific tests,
54+
since we are investigating the behaviour of the monitor script with different tests.
55+
required: false
56+
type: boolean
57+
default: true
5058
secrets:
5159
HUGGING_FACE_HUB_TOKEN:
5260
required: false
@@ -145,6 +153,7 @@ jobs:
145153

146154
- name: Start monitoring script
147155
id: monitor-script
156+
if: ${{ !inputs.disable-monitor }}
148157
shell: bash
149158
continue-on-error: true
150159
run: |
@@ -328,7 +337,7 @@ jobs:
328337
cat test/**/*_toprint.log || true
329338
330339
- name: Stop monitoring script
331-
if: always() && steps.monitor-script.outputs.monitor-script-pid
340+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
332341
shell: bash
333342
continue-on-error: true
334343
env:

.github/workflows/_mac-test.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,14 @@ on:
3030
default: 270
3131
description: |
3232
Set the maximum (in minutes) how long the workflow should take to finish
33+
disable-monitor:
34+
description: |
35+
[Experimental] Disable utilization monitoring for tests.
36+
Currently, by default we disable the monitor job and only look for specific tests,
37+
since we are investigating the behaviour of the monitor script with different tests.
38+
required: false
39+
type: boolean
40+
default: true
3341

3442
jobs:
3543
test:
@@ -101,6 +109,7 @@ jobs:
101109

102110
- name: Start monitoring script
103111
id: monitor-script
112+
if: ${{ !inputs.disable-monitor }}
104113
continue-on-error: true
105114
run: |
106115
${CONDA_RUN} python3 -m tools.stats.monitor > usage_log.txt 2>&1 &
@@ -200,7 +209,7 @@ jobs:
200209
cat test/**/*_toprint.log || true
201210
202211
- name: Stop monitoring script
203-
if: always() && ${{ steps.monitor-script.outputs.monitor-script-pid }}
212+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
204213
continue-on-error: true
205214
env:
206215
MONITOR_SCRIPT_PID: ${{ steps.monitor-script.outputs.monitor-script-pid }}

.github/workflows/_rocm-test.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,14 @@ on:
3838
default: ""
3939
description: |
4040
List of tests to include (empty string implies default list)
41+
disable-monitor:
42+
description: |
43+
[Experimental] Disable utilization monitoring for tests.
44+
Currently, by default we disable the monitor job and only look for specific tests,
45+
since we are investigating the behaviour of the monitor script with different tests.
46+
required: false
47+
type: boolean
48+
default: true
4149

4250
env:
4351
GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
@@ -91,6 +99,7 @@ jobs:
9199

92100
- name: Start monitoring script
93101
id: monitor-script
102+
if: ${{ !inputs.disable-monitor }}
94103
shell: bash
95104
continue-on-error: true
96105
run: |
@@ -247,7 +256,7 @@ jobs:
247256
cat test/**/*_toprint.log || true
248257
249258
- name: Stop monitoring script
250-
if: always() && steps.monitor-script.outputs.monitor-script-pid
259+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
251260
shell: bash
252261
continue-on-error: true
253262
env:

.github/workflows/_win-test.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,14 @@ on:
2828
default: 240
2929
description: |
3030
Set the maximum (in minutes) how long the workflow should take to finish
31+
disable-monitor:
32+
description: |
33+
[Experimental] Disable utilization monitoring for tests.
34+
Currently, by default we disable the monitor job and only look for specific tests,
35+
since we are investigating the behaviour of the monitor script with different tests.
36+
required: false
37+
type: boolean
38+
default: true
3139

3240
env:
3341
GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
@@ -101,6 +109,7 @@ jobs:
101109
- name: Start monitoring script
102110
id: monitor-script
103111
shell: bash
112+
if: ${{ !inputs.disable-monitor }}
104113
continue-on-error: true
105114
run: |
106115
# Windows conda doesn't have python3 binary, only python, but it's python3
@@ -213,7 +222,7 @@ jobs:
213222
cat test/**/*_toprint.log || true
214223
215224
- name: Stop monitoring script
216-
if: always() && steps.monitor-script.outputs.monitor-script-pid
225+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
217226
shell: bash
218227
continue-on-error: true
219228
env:

.github/workflows/_xpu-test.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,14 @@ on:
3838
default: ""
3939
description: |
4040
List of tests to include (empty string implies default list)
41+
disable-monitor:
42+
description: |
43+
[Experimental] Disable utilization monitoring for tests.
44+
Currently, by default we disable the monitor job and only look for specific tests,
45+
since we are investigating the behaviour of the monitor script with different tests.
46+
required: false
47+
type: boolean
48+
default: true
4149

4250
env:
4351
GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
@@ -83,6 +91,7 @@ jobs:
8391

8492
- name: Start monitoring script
8593
id: monitor-script
94+
if: ${{ !inputs.disable-monitor }}
8695
shell: bash
8796
continue-on-error: true
8897
run: |
@@ -242,7 +251,7 @@ jobs:
242251
cat test/**/*_toprint.log || true
243252
244253
- name: Stop monitoring script
245-
if: always() && steps.monitor-script.outputs.monitor-script-pid
254+
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
246255
shell: bash
247256
continue-on-error: true
248257
env:

0 commit comments

Comments
 (0)