Update wmcore_transferor global and local quotas dynamically#914
Update wmcore_transferor global and local quotas dynamically#914haozturk wants to merge 3 commits intodmwm:masterfrom
Conversation
|
Hi @amaltaro @todor-ivanov can you please tell us the behavior of MSTransferor when rucio rejects the rule creations of it in case of it doesn't have quota left. Will it keep retrying? We need to come up with a strategy for the limits of this account as soon as possible. |
|
@haozturk, feel free to join (or anyone else from the DM team) the regular Monday WM meeting with stakeholders when DM needs prompt feedback from WM, as the team is always committed to a lot of activities, and offline communications can be less effective :) |
|
|
||
| client = Client() | ||
| account = "wmcore_transferor" | ||
| rse_expression = "rse_type=DISK&cms_type=real&tier<3&tier>0" |
There was a problem hiding this comment.
Hi @amaltaro the global limit can be get by either [1] or [2]. Global limit means a limit across multiple RSEs. Rucio allows multiple global limits, so if you use [2], you will get an iterable object instead of a single value. We'll introduce only one global limit, which will be over this RSE expression.
I can't think of any reason why this RSE expression might change in the future or we can have multiple global limits with different RSE expressions, but since the system allows it, making your system rely on a configuration that can essentially change isn't a good idea in my opinion. Probably the best is not to touch MSTransferor and let it fail and retry. It will fail with a distinct error message [3] and the operator looking into the logs will know what's going on. (If s/he misses the alerts)
[1] https://rucio.github.io/documentation/html/client_api/accountclient.html#rucio.client.accountclient.AccountClient.get_global_account_limit
[2] https://rucio.github.io/documentation/html/client_api/accountclient.html#rucio.client.accountclient.AccountClient.get_global_account_limits
[3]
$ rucio rule add -a haozturk --rses T1_RU_JINR_Disk --comment "test" --copies 1 -d cms:/GluGluHToTauTau_HTXSFilter_STXS1p1_Bin110to113_M125_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL16MiniAODAPVv2-106X_mcRun2_asymptotic_preVFP_v11_ext1-v2/MINIAODSIM
2025-05-06 10:19:51,129 ERROR There is not enough quota left to fulfil the operation.
Details: There is insufficient quota on any of the target RSE's to fulfill the operation.
Currently, wmcore_transferor quotas are statically set and it's as follows [1]. Its usage reached to record high of 44.4 PB [2] and made many sites run out of space. In this issue [3], we discussed a strategy to introduce global and local limits for this account to keep its usage under control. This PR implements what's been discussed in that issue as well as my additions. To explain the changes:
The output of a dry-run can be found at [4]
Before Eric reviews it, I'd like to get Ops' and WM's feedback. That's why I'll mark the PR as draft for now. Once we agree that the PR reflects what we need, I'll open the PR for Eric's review. Let me tag @Panos512 @hassan11196 @drkovalskyi @amaltaro @eachristgr and @juanpablosalas
[1]
[2] https://monit-grafana.cern.ch/d/viS0q0ZSz/rucio-account-monitoring?orgId=11&from=now-90d&to=now&var-RSE=All&var-RSEType=DISK&var-AccountName=wmcore_transferor&var-AccountType=All
[3] https://its.cern.ch/jira/browse/CMSTRANSF-479
[4]