xapi-cli-server: stop using SR records for cross pool migrations #6905
xapi-cli-server: stop using SR records for cross pool migrations #6905psafont wants to merge 2 commits intoxapi-project:26.1-lcmfrom
Conversation
There was a problem hiding this comment.
Maybe we need a new dedicated operation to do the lookups on the coordinator's database directly where it has fast,local access to the DB?
In extreme cases there could be a lot of shared SRs, e.g. in one use-case there would be 1 PBD for an NFS SR per VM, per host! So you could have hundreds of these/host.
| (* The preferred SR is determined to be as the SR that the destine host has a PDB attached to it, | ||
| and among the choices of that the shared is preferred first(as it is recommended to have shared storage | ||
| in pool to host VMs), and then the one with the maximum available space *) | ||
| (* The preferred SR is determined to be as the SR that the destine |
The VDI map should be able to bypass the selection of the destination SR, when all the vdis are correctly mapped. Interestingly it does the SR selection just in case, and only if it's succesful it will overlay that onto the vdi_map |
…ross-pool migration SR records have variant fields that add new variants when storage features are added, as is the case with current_operations. This means that fetching records from remote pools cause deserialization exceptions when their records contain newer, unknown variants. xe uses SR records to minimize the amount of remote calls needed for selecting the SR with the highest amount of free space during cross-pool migrations. Change xe to fetch references of only non-ISO SRs, and filter PBD records with them. This means that now PBD records are fetched, but these don't contain any variants, and have been unchanged since early 2008. This has the side effect of reducing the amount of calls from 2 per host-attached PBDS to 1 per non-iso, host-attached PBD with the additional cost of a single call to fetch the non-iso SRs Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
SR records have variant fields that add new variants when storage features are added, as is the case with current_operations. This means that fetching records from remote pool cause deserialization exceptions when their records contain newer, unknown variants when using OCaml-based clients. Use expression-based queries to separate shared from local SRs without records, and add 2 queries per SR to gather the free space. Now the amount of calls have changed from 1 per host-attached PBD to 2, but the amount of calls can be limited to shared SRs, if there are any. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
2effd6a to
13d23f3
Compare
|
I just migrated using xe from a 25.33.1 host to another one with 26.1.0, to an SR with VDI.revert and I didn't see any errors, so I'm puzzled as to what caused the failure in late 2024 that made us revert the change about allowed SR operations. Maybe it's because the VM is halted? but I don't see how it should affect if it's not a live migration I'll try a live migration, and adding all operations to the allowed ops to try to force an error. |
|
I've tested with allowing to advertise all the operations in the receiving SR, without issues. destination: advertised features until now: Additional features exposed:
I still need to properly test a live migration, but I'm surprised it works without this change, I'm now left wondering in what triggered the error in CA-402263 |
|
I understand |
SR records have variant fields that add new variants when storage features are added, as is the case with current_operations.
This means that fetching records from remote pool cause deserialization exceptions when their records contain newer, unknown variants when using OCaml-based clients.
xe uses SR records to minimize remote calls to select the SR with the largest amount of free space during cross-pool migrations.
Change xe to use expressions for filtering SRs and minimize the amount of calls done.
This means that now PBD records are fetched, but these don't contain any variants, and have been unchanged since early 2008.
Now the amount of calls have changed from 1 per host-attached PBD to 2, but the amount of calls can be limited to shared SRs, if there are any.
This is a prerequisite to reintroduce this change: 723a498#diff-22f9f03f35e06c26ffd8f5ee9a8573167dae627ac8927d260d798c47eacbc9ceR29, which is needed to be able to add new SR operations, like VDI_revert.
I'm opening it as a draft PR to gather comments while I test that it fixes the migration of a vm to a host with SRs with new, unknown capabilities.