Skip to content

Conversation

@agrare
Copy link
Member

@agrare agrare commented Nov 20, 2025

Allow any service resource retirement task to be executed with embedded workflows

Retirement was only ever implemented in automate unlike MiqProvision, https://github.com/ManageIQ/manageiq-content/tree/master/content/automate/ManageIQ/Infrastructure/VM/Retirement/StateMachines/Methods.class/__methods__

This implements a simple VmRetireTask::StateMachine that can be called from workflows

TODO:

  • Implement for VM and OrchestrationStacks
  • Check for retired and retiring
  • Implement remove_from_provider and delete_from_vmdb options

Related:

Depends on:

[----] I, [2025-12-01T15:29:19.618373#161257:af14]  INFO -- workflows: Running state: [Pass:PreRetire] with input [{"dialog":{}}]...
[----] I, [2025-12-01T15:29:19.621245#161257:af14]  INFO -- workflows: Running state: [Pass:PreRetire] with input [{"dialog":{}}]...Complete - next state [Retire] output: [{"dialog":{}}]
[----] I, [2025-12-01T15:29:19.624677#161257:af14]  INFO -- workflows: Running state: [Task:Retire] with input [{"dialog":{}}]...
[----] I, [2025-12-01T15:29:19.634189#161257:af14]  INFO -- evm: MIQ(VmRetireTask#execute_queue) Queuing VM Retire: [VM Retire for: ag-prov-test0005]...
[----] I, [2025-12-01T15:29:31.698861#161257:af14]  INFO -- evm: MIQ(VmRetireTask#execute) Executing VM Retire request: [VM Retire for: ag-prov-test0005]
[----] I, [2025-12-01T15:29:31.728394#161257:af14]  INFO -- evm: MIQ(VmRetireTask#execute) VM Retire initiated
[----] I, [2025-12-01T15:29:31.755819#161257:af14]  INFO -- evm: Starting Phase <run_retire>
[----] I, [2025-12-01T15:29:31.760660#161257:af14]  INFO -- evm: Starting Phase <check_vm_power_state>
[----] I, [2025-12-01T15:29:31.770397#161257:af14]  INFO -- evm: MIQ(VmRetireTask#check_vm_power_state) Powering Off VM <ag-prov-test0005> in provider <openstack>
[----] I, [2025-12-01T15:29:31.795148#161257:af14]  INFO -- evm: Starting Phase <poll_vm_stopped>
[----] I, [2025-12-01T15:29:31.798512#161257:af14]  INFO -- evm: MIQ(VmRetireTask#poll_vm_stopped) VM:<ag-prov-test0005> on Provider:<openstack> has Power State:<on>
[----] I, [2025-12-01T15:29:31.798640#161257:af14]  INFO -- evm: MIQ(VmRetireTask#poll_vm_stopped) VM:<ag-prov-test0005> on Provider:<openstack> has not stopped
[----] I, [2025-12-01T15:29:31.918167#161257:af14]  INFO -- evm: MIQ(MiqQueue#m_callback) Message id: [23707], Invoking Callback with args: [:stop_queue, "ok", "Message delivered successfully", "#<MiqAeEngine::MiqAeWorkspaceRuntime:0x00007fef38758cb0 @readonly=false, @nodes=[#<MiqAeEngine::MiqAeObject:0x00007fef19f15a08 @workspace=#<MiqAeEngine::MiqAeWorkspaceRuntime:0x00007fef38758cb0 ...>, @namespace=\"ManageIQ/System\", @klass=\"Process\", @instance=\"Event\", @attributes={\"event_stream_id\"=>\"3030\", \"event_type\"=>\"request_vm_poweroff\", \"ext_management_system_id\"=>\"4\", \"miq_event_id\"=>\"3030\", \"object_name\"=>\"Event\", \"vm_id\"=>\"846\", \"vmdb_object_type\"=>\"vm\", \"event_stream\"=>#<MiqAeServiceMiq..."]
[----] I, [2025-12-01T15:29:31.922817#161257:af14]  INFO -- evm: MIQ(MiqQueue.put) Message id: [23709], Zone: [default], Role: [ems_operations], Server: [], MiqTask id: [], Handler id: [], Ident: [generic], Target id: [], Instance id: [846], Task id: [], Command: [ManageIQ::Providers::Openstack::CloudManager::Vm.raw_stop], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: []
[----] I, [2025-12-01T15:29:31.929901#161257:af14]  INFO -- evm: MIQ(MiqQueue#deliver) Message id: [23709], Delivering...
[----] I, [2025-12-01T15:29:31.932433#161257:af14]  INFO -- evm: MIQ(ManageIQ::Providers::Openstack::CloudManager#with_provider_connection) Connecting through ManageIQ::Providers::Openstack::CloudManager: [openstack]
[----] I, [2025-12-01T15:29:32.671297#161257:af14]  INFO -- evm: MIQ(MiqQueue#delivered) Message id: [23709], State: [ok], Delivered in [0.741377379] seconds
[----] I, [2025-12-01T15:29:42.705175#161257:af14]  INFO -- evm: MIQ(MiqQueue#deliver) Message id: [23708], Delivering...
[----] I, [2025-12-01T15:29:42.711322#161257:af14]  INFO -- evm: MIQ(VmRetireTask#poll_vm_stopped) VM:<ag-prov-test0005> on Provider:<openstack> has Power State:<off>
[----] I, [2025-12-01T15:29:42.711522#161257:af14]  INFO -- evm: MIQ(VmRetireTask#poll_vm_stopped) VM:<ag-prov-test0005> on Provider:<openstack> has stopped, retiring...
[----] I, [2025-12-01T15:29:42.711695#161257:af14]  INFO -- evm: Starting Phase <start_retirement>
[----] I, [2025-12-01T15:29:42.719566#161257:af14]  INFO -- evm: Starting Retirement for [id:<846>, name:<ag-prov-test0005>]
[----] I, [2025-12-01T15:29:42.729772#161257:af14]  INFO -- evm: Starting Phase <finish_retirement>
[----] I, [2025-12-01T15:29:42.733802#161257:af14]  INFO -- evm: Finishing Retirement for [ag-prov-test0005]
[----] I, [2025-12-01T15:29:42.739808#161257:af14]  INFO -- evm: Calling audit event for: Vm: [id:<846>, name:<ag-prov-test0005>] with Retires On value: [12/01/25 20:29 UTC], has been retired 
[----] I, [2025-12-01T15:29:42.801794#161257:af14]  INFO -- evm: Called audit event for: Vm: [id:<846>, name:<ag-prov-test0005>] with Retires On value: [12/01/25 20:29 UTC], has been retired 
[----] I, [2025-12-01T15:29:42.801918#161257:af14]  INFO -- evm: Starting Phase <finish>
[----] I, [2025-12-01T15:29:42.814459#161257:af14]  INFO -- evm: Child tasks finished but current task still processing. Setting state to: [retired]...

@agrare agrare requested review from Fryguy and kbrock as code owners November 20, 2025 21:12
@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch from 1cf83c4 to 86bf45c Compare November 24, 2025 14:31
@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch 3 times, most recently from 356d879 to a80dfbf Compare December 2, 2025 19:43
fail!("#{self.class.model_being_retired} already in the process of being retired")
end

Notification.create!(:type => :vm_retiring, :subject => source)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE moved these common states up to the base MiqRetireTask class so they will apply to Orchestration stacks as well.

It is odd, but automate's Orchestration Retirement also uses :vm_retiring notification type:
https://github.com/ManageIQ/manageiq-content/blob/master/content/automate/ManageIQ/Cloud/Orchestration/Retirement/StateMachines/Methods.class/__methods__/start_retirement.rb#L48
But :orchestration_stack_retired
https://github.com/ManageIQ/manageiq-content/blob/master/content/automate/ManageIQ/Cloud/Orchestration/Retirement/StateMachines/Methods.class/__methods__/finish_retirement.rb#L19

That is clearly a mistake, will fix separately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch 2 times, most recently from dff20ee to 404adc3 Compare December 5, 2025 14:40
def remove_from_provider
case options[:removal_type]
when "remove_from_disk"
if vm.miq_provision || vm.is_tagged_with?("retire_full", :ns => "/managed/lifecycle")
Copy link
Member Author

@agrare agrare Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy here's another question, do we want to keep the "did we provision it or is it tagged" or just have a "RemoveFromProvider": true option e.g. https://github.com/ManageIQ/manageiq-content/pull/775/files#diff-098855c3aa217fca0cc56c9749bc1b52ef8aa0440ae9f5a6d039f285d59839fcR14

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tagged based thing is interesting - I think we should keep it, but maybe it makes more sense as part of the workflow itself? We definitely have to make sure we document it well.

@agrare agrare changed the title [WIP] Run Service retirement subtasks with workflows Run Service retirement subtasks with workflows Dec 5, 2025
Comment on lines +10 to +12
return fail!("#{self.class.model_being_retired} already retired")
elsif source.retiring?
return fail!("#{self.class.model_being_retired} already in the process of being retired")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be i18n if they are user facing.

Suggested change
return fail!("#{self.class.model_being_retired} already retired")
elsif source.retiring?
return fail!("#{self.class.model_being_retired} already in the process of being retired")
return fail!(_"%{model_being_retired} already retired" % self.class.model_being_retired)
elsif source.retiring?
return fail!(_"%{model_being_retired} already in the process of being retired" % self.class.model_being_retired)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no user context here would that have to be N_() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting - I guess we can't do it then (even if we did N_() - there's nothing to do the actual translation and interpolation later)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is setting the message attribute of the miq_request_task record. I don't know if the UI is translating the message here, it is shown in the Services / Requests list:

Image

If N_() would add these for translation later then it sounds like it would be worth it but I don't know if the UI is actually translating these

if vm.ext_management_system
_log.info("#{log_prefix} not yet removed from provider...")
vm.queue_refresh
requeue_phase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some caps here so this doesn't run forever. Not sure if time-based, count-based or both. Perhaps we have sensible defaults here for both, and then expose options to retire_execute allowing the author to override.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a TimeoutSeconds ASL state property that we could use, an internal state retry counter would have to be method specific but we could do that too.

Copy link
Member

@Fryguy Fryguy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yeah - TimeoutSeconds makes sense, but that's a top-level thing (it's around the entire retire_execute). Similarly there's the whole Retries thing built-in to ASL.

Even so, I agree that we can document TimeoutSeconds as the way to prevent it from running forever (maybe even make it required?)

Copy link
Member Author

@agrare agrare Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some disagreement between the ASL spec and AWS Step Functions on the default value for TimeoutSeconds:

ASL:

If not provided, the default value of "TimeoutSeconds" is 60.

Step Functions:

The default value is 99999999.

60 is way too short for provision_execute or retire_execute, we should at a minimum add our own default value for those builtin states, maybe 3600? I think automate retries once a minute 100 times or something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 60 is way too short - we can pick something reasonable, but high, as a default like 3600 globally, perhaps.

I also agree on builtins having their own separate default value - 3600 for this is a good choice (even if it matches the global value, I think having is explicitly defined as a separate default is better, in case we change the global later)

end

def create_retired_notification!
notification_type = "#{self.class.model_being_retired.name.underscore}_retired"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to get here trying to retire something that doesn't have a notification type? e.g. if you try to retire a generic service? maybe we have an "other_retired" or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the subclasses of MiqRetireTask

vmdb(dev)> MiqRetireTask.subclasses.map(&:name)
=> ["VmRetireTask", "ServiceRetireTask", "OrchestrationStackRetireTask"]

Each of these has a retiring and retired (except orchestration_stack_retiring which is being fixed separately)

vmdb(dev)> NotificationType.where('name LIKE ? OR name LIKE ?', "%_retired", "%_retiring").pluck(:name)
=> ["service_retired", "service_retiring", "vm_retired", "vm_retiring", "orchestration_stack_retired"]

@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch from 404adc3 to 3f9576c Compare December 8, 2025 15:38
@agrare agrare requested a review from bdunne as a code owner December 8, 2025 15:38
@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch from 3f9576c to e6b790f Compare December 8, 2025 16:28
@Fryguy Fryguy self-assigned this Dec 8, 2025
@agrare agrare force-pushed the run_service_retire_subtasks_with_workflows branch from e6b790f to 3a40a31 Compare December 10, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants