fix: tfpluginsdk nil panics when LateInit off#600
fix: tfpluginsdk nil panics when LateInit off#600jwefers wants to merge 1 commit intocrossplane:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds a defensive pre-processing step in the Terraform instance diff computation that merges the observed instance state with existing desired parameters before creating the resource configuration. Introduces a new helper method to convert instance state to a JSON map suitable for diff operations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/controller/external_tfpluginsdk.go (1)
841-852:fromInstanceStateToJSONMapForDiffis a near-verbatim duplicate offromInstanceStateToJSONMap.The only difference from the existing
fromInstanceStateToJSONMap(Lines 828-839) is that this variant drops thecty.Valuereturn. You could eliminate this new method entirely by delegating to the existing one and discarding the second return value — both in the helper and at the call site.♻️ Proposed refactor: delegate to existing method
Remove
fromInstanceStateToJSONMapForDiffentirely and update the call site ingetResourceDataDiff:-func (n *terraformPluginSDKExternal) fromInstanceStateToJSONMapForDiff(newState *tf.InstanceState) (map[string]interface{}, error) { - impliedType := n.config.TerraformResource.CoreConfigSchema().ImpliedType() - attrsAsCtyValue, err := newState.AttrsAsObjectValue(impliedType) - if err != nil { - return nil, errors.Wrap(err, "could not convert attrs to cty value") - } - stateValueMap, err := schema.StateValueToJSONMap(attrsAsCtyValue, impliedType) - if err != nil { - return nil, errors.Wrap(err, "could not convert instance state value to JSON") - } - return stateValueMap, nil -}And at the call site (Line 445):
- stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s) + stateValueMap, _, err := n.fromInstanceStateToJSONMap(s)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/external_tfpluginsdk.go` around lines 841 - 852, The function fromInstanceStateToJSONMapForDiff is a near-duplicate of fromInstanceStateToJSONMap; remove fromInstanceStateToJSONMapForDiff and update the caller(s) (e.g., getResourceDataDiff) to call fromInstanceStateToJSONMap and ignore the extra cty.Value return (e.g., _, err := fromInstanceStateToJSONMap(...)); preserve existing error handling and return behavior from the original method when propagating errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/external_tfpluginsdk.go`:
- Around line 444-455: The conversion error from
fromInstanceStateToJSONMapForDiff is being dropped because err is scoped inside
the inner if and never logged or returned, causing a silent fallback to
n.params; change the logic so the err is captured in the outer scope, and if
fromInstanceStateToJSONMapForDiff returns an error emit a warning/debug log
(using the controller's logger already in this file, e.g., r.logger or similar)
that includes the error and context (resourceExists, n.Name or similar), then
proceed with the graceful fallback to params = n.params (or alternatively return
the error from the surrounding function/Observe if you prefer surfacing
failures); ensure you still merge stateValueMap into params when no error
occurs.
- Around line 435-456: The framework implementation reuses a single sparse
resourceTerraformConfigValue set in Connect(), which causes nulls from
spec.forProvider to override prior observed state when building the proposed
state; fix by merging the observed instance state into the config before
diffing: in Observe() (or immediately before getDiffPlanResponse()) read the
current instance state, convert it to a JSON map the same way
external_tfpluginsdk.go does (e.g., fromInstanceStateToJSONMapForDiff
equivalent), and overlay the static resourceTerraformConfigValue entries onto
that merged map so previously-computed/non-specified attributes keep their
observed values; alternatively, change proposedState() to prefer the observed
state value when config value is nil for non-computed attributes (replace the
"newV = configV" branch to guard against nil configV by using stateV if configV
is nil), referencing resourceTerraformConfigValue, Observe(), proposedState(),
and getDiffPlanResponse() to locate where to apply the change.
---
Nitpick comments:
In `@pkg/controller/external_tfpluginsdk.go`:
- Around line 841-852: The function fromInstanceStateToJSONMapForDiff is a
near-duplicate of fromInstanceStateToJSONMap; remove
fromInstanceStateToJSONMapForDiff and update the caller(s) (e.g.,
getResourceDataDiff) to call fromInstanceStateToJSONMap and ignore the extra
cty.Value return (e.g., _, err := fromInstanceStateToJSONMap(...)); preserve
existing error handling and return behavior from the original method when
propagating errors.
| func (n *terraformPluginSDKExternal) getResourceDataDiff(tr resource.Terraformed, ctx context.Context, s *tf.InstanceState, resourceExists bool) (*tf.InstanceDiff, error) { //nolint:gocyclo | ||
| resourceConfig := tf.NewResourceConfigRaw(n.params) | ||
| // before we calculate the diff and thus what we send over into the terraform provider | ||
| // code, we construct the desired resource fully specified, by taking the observed | ||
| // state from the upstream resource and then apply the desired spec.forProvider of the crossplane | ||
| // resource over it. This mirrors what terraform/opentofu do, because some terraform providers | ||
| // do not like nil values when processing the desired resource. This is especially important | ||
| // when managementPolicies have LateInitialize excluded, which means spec.forProvider can | ||
| // be very sparse. | ||
| params := n.params | ||
| if resourceExists && s != nil && len(n.params) > 0 { | ||
| stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s) | ||
| if err == nil && len(stateValueMap) > 0 { | ||
| params = make(map[string]any, len(stateValueMap)+len(n.params)) | ||
| for k, v := range stateValueMap { | ||
| params[k] = v | ||
| } | ||
| for k, v := range n.params { | ||
| params[k] = v | ||
| } | ||
| } | ||
| } | ||
| resourceConfig := tf.NewResourceConfigRaw(params) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find external_tfpluginfw.go and check its structure
fd -e go -g '*tfpluginfw*'Repository: crossplane/upjet
Length of output: 184
🏁 Script executed:
#!/bin/bash
# Search for getResourceDataDiff method in Framework implementation
rg "getResourceDataDiff" --type goRepository: crossplane/upjet
Length of output: 432
🏁 Script executed:
#!/bin/bash
# Get the full content of external_tfpluginfw.go to understand the diff computation
fd -e go -g '*tfpluginfw*' -x cat {} -nRepository: crossplane/upjet
Length of output: 50372
🏁 Script executed:
#!/bin/bash
# Look for the Observe method in external_tfpluginfw.go to see how it computes diffs
rg -A 50 "func \(n \*terraformPluginFrameworkExternalClient\) Observe" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 2386
🏁 Script executed:
#!/bin/bash
# Search for any state-related or diff-related logic in the Framework implementation
rg "getDiffPlanResponse|filteredDiffExists|proposed|stateValueMap" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 2401
🏁 Script executed:
#!/bin/bash
# Look for the proposedState function to understand how it computes proposed state
rg -B 5 -A 20 "func proposedState" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Check how getDiffPlanResponse is called in Observe method and if state merging happens
rg -B 10 -A 30 "planResponse, hasDiff, err := n.getDiffPlanResponse" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 1677
🏁 Script executed:
#!/antml:thinking
Also need to understand:
1. Is there state merging before getDiffPlanResponse is called?
2. Look at the full Observe method to see the complete flow
</thinking>
#!/bin/bash
# Get full Observe method to see complete flow
rg -A 150 "func \(n \*terraformPluginFrameworkExternalClient\) Observe\(ctx context.Context, mg xpresource.Managed\)" pkg/controller/external_tfpluginfw.go | head -200Repository: crossplane/upjet
Length of output: 289
🏁 Script executed:
#!/bin/bash
# Look for proposedState function definition
rg -B 2 -A 15 "^func proposedState" pkg/controller/Repository: crossplane/upjet
Length of output: 1342
🏁 Script executed:
#!/bin/bash
# Check if Framework implementation merges state like SDK does before calling getDiffPlanResponse
rg -B 20 "planResponse, hasDiff, err := n.getDiffPlanResponse" pkg/controller/external_tfpluginfw.go | grep -A 20 "stateValueMap\|params\|merge"Repository: crossplane/upjet
Length of output: 583
🏁 Script executed:
#!/bin/bash
# Let's see the full context around where getDiffPlanResponse is called
sed -n '400,500p' pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 4016
🏁 Script executed:
#!/bin/bash
# Check if Framework's Connect method merges state like SDK version does
rg -A 5 "resourceTerraformConfigValue.*=" pkg/controller/external_tfpluginfw.go | head -20Repository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Compare: look at SDK version's Connect to see when params are set
rg -B 5 -A 10 "n.params = " pkg/controller/external_tfpluginsdk.go | head -30Repository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Look at the full getDiffPlanResponse in SDK to see the state merging
rg -B 5 -A 80 "func \(n \*terraformPluginSDKExternal\) getResourceDataDiff" pkg/controller/external_tfpluginsdk.goRepository: crossplane/upjet
Length of output: 3494
🏁 Script executed:
#!/bin/bash
# Look at the full proposedNew implementation to understand the merging logic
rg -A 100 "^func proposedNew" pkg/controller/proposed_state.go | head -150Repository: crossplane/upjet
Length of output: 5555
🏁 Script executed:
#!/antml:thinking>
I need to see the full proposedNew implementation to understand if it truly merges state with config in a way that prevents the partial-value panic.
</thinking>Repository: crossplane/upjet
Length of output: 204
🏁 Script executed:
#!/an/bin/bash
# Check how n.resourceTerraformConfigValue is set in Framework's Connect method
rg -B 5 -A 15 "resourceTerraformConfigValue:" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 998
🏁 Script executed:
#!/bin/bash
# Check if resourceTerraformConfigValue is ever updated after Connect
rg "resourceTerraformConfigValue\s*=" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Look at how params is constructed in Framework's Connect vs SDK's Connect
rg -A 5 "params.*:=" pkg/controller/external_tfpluginfw.go | head -40Repository: crossplane/upjet
Length of output: 405
🏁 Script executed:
#!/bin/bash
# See the full picture of what happens with params in Framework Connect
sed -n '160,260p' pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 4770
🏁 Script executed:
#!/bin/bash
# Check if resourceConfigTFValue/resourceTerraformConfigValue is static or gets updated
rg "SetReconstructedFrameworkTFState\|SetFrameworkTFState" pkg/controller/external_tfpluginfw.go | head -20Repository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Check what happens in getDiffPlanResponse - does it merge state with config like SDK does?
sed -n '350,410p' pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 2864
🏁 Script executed:
#!/bin/bash
# Compare: In SDK's getResourceDataDiff, state is merged. Does Framework do similar merging?
# Look at proposedState call to see what gets passed
rg -B 3 -A 3 "proposedState\(n.resourceSchema" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 489
🏁 Script executed:
#!/bin/bash
# Check Framework's Create method to see how it handles partial params
rg -A 80 "func \(n \*terraformPluginFrameworkExternalClient\) Create\(ctx context.Context" pkg/controller/external_tfpluginfw.go | head -100Repository: crossplane/upjet
Length of output: 3828
🏁 Script executed:
#!/antml:thinking>
I need to check if Create/Update methods in Framework use getDiffPlanResponse or if they have a different code path that might not do the state merging.
</thinking>Repository: crossplane/upjet
Length of output: 204
🏁 Script executed:
#!/bin/bash
# Check Framework's Update method
rg -A 60 "func \(n \*terraformPluginFrameworkExternalClient\) Update\(ctx context.Context" pkg/controller/external_tfpluginfw.go | head -80Repository: crossplane/upjet
Length of output: 2802
🏁 Script executed:
#!/bin/bash
# Look at proposedNewAttributes to understand how it merges prior and config
rg -A 50 "^func proposedNewAttributes" pkg/controller/proposed_state.goRepository: crossplane/upjet
Length of output: 1997
🏁 Script executed:
#!/bin/bash
# Verify: In SDK's Connect, are params initially from spec.forProvider (potentially sparse)?
# Look at how params are constructed in SDK Connect
rg -B 5 -A 20 "func \(c \*TerraformSDKConnector\) Connect\(ctx context.Context" pkg/controller/external_tfpluginsdk.go | head -50Repository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Check if Framework's resourceTerraformConfigValue gets updated during Observe or stays static
rg "resourceTerraformConfigValue" pkg/controller/external_tfpluginfw.goRepository: crossplane/upjet
Length of output: 771
🏁 Script executed:
#!/bin/bash
# Check if Framework's proposedState provides same protection as SDK's state merge
# Look at the exact logic in proposedNewAttributes for non-computed attributes
sed -n '90,150p' pkg/controller/proposed_state.goRepository: crossplane/upjet
Length of output: 2368
🏁 Script executed:
#!/bin/bash
# Confirm: Does Framework ever update params/config after initial Connect?
# Search for any updates to params or resourceTerraformConfigValue after Connect
rg "params\s*=" pkg/controller/external_tfpluginfw.go | grep -v "resourceTerraformConfigValue:"Repository: crossplane/upjet
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Look at SDK's comment about why state merging is needed
sed -n '430,465p' pkg/controller/external_tfpluginsdk.go | head -40Repository: crossplane/upjet
Length of output: 1733
Framework implementation (external_tfpluginfw.go) has the same partial-value risk and needs similar protection.
Unlike the SDK implementation which re-merges state into params on each getResourceDataDiff() call, the Framework implementation sets resourceTerraformConfigValue once in Connect() from the sparse spec.forProvider, then reuses it statically throughout the resource lifecycle.
The proposedState() function does attempt to merge prior state with config, but for non-computed attributes it explicitly prefers config values even when null:
} else {
// For non-computed attributes, we always take the config value,
// even if it is null.
newV = configV
}When spec.forProvider is sparse (due to managementPolicies with LateInitialize excluded), this means optional fields not in the config remain null in the proposed state sent to the provider—the exact scenario the SDK fix addresses.
Consider opening a follow-up PR to apply a similar fix: either re-merge state into params during Observe(), or update resourceTerraformConfigValue dynamically with observed state before calling getDiffPlanResponse().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/external_tfpluginsdk.go` around lines 435 - 456, The framework
implementation reuses a single sparse resourceTerraformConfigValue set in
Connect(), which causes nulls from spec.forProvider to override prior observed
state when building the proposed state; fix by merging the observed instance
state into the config before diffing: in Observe() (or immediately before
getDiffPlanResponse()) read the current instance state, convert it to a JSON map
the same way external_tfpluginsdk.go does (e.g.,
fromInstanceStateToJSONMapForDiff equivalent), and overlay the static
resourceTerraformConfigValue entries onto that merged map so
previously-computed/non-specified attributes keep their observed values;
alternatively, change proposedState() to prefer the observed state value when
config value is nil for non-computed attributes (replace the "newV = configV"
branch to guard against nil configV by using stateV if configV is nil),
referencing resourceTerraformConfigValue, Observe(), proposedState(), and
getDiffPlanResponse() to locate where to apply the change.
| if resourceExists && s != nil && len(n.params) > 0 { | ||
| stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s) | ||
| if err == nil && len(stateValueMap) > 0 { | ||
| params = make(map[string]any, len(stateValueMap)+len(n.params)) | ||
| for k, v := range stateValueMap { | ||
| params[k] = v | ||
| } | ||
| for k, v := range n.params { | ||
| params[k] = v | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The conversion error is silently discarded, leaving a silent fallback to the pre-fix panic path.
When fromInstanceStateToJSONMapForDiff returns an error (e.g. a schema mismatch), the err variable is scoped to the inner if block and never logged or returned — the code silently falls back to params = n.params, which is precisely the partial-value path this PR is fixing. There is no signal to a provider developer (or Upjet maintainer) that the state merge was skipped.
At minimum, emit a debug/warning log so the fallback is observable. Depending on your resilience preference, you could also return the error (which would surface it in Observe). What was the intent — always degrade gracefully, or only on truly transient conversion failures?
🛡️ Proposed fix: log the error before falling back
if resourceExists && s != nil && len(n.params) > 0 {
stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s)
- if err == nil && len(stateValueMap) > 0 {
+ if err != nil {
+ n.logger.Debug("Failed to convert instance state to JSON map for diff; falling back to spec-only params",
+ "error", err)
+ } else if len(stateValueMap) > 0 {
params = make(map[string]any, len(stateValueMap)+len(n.params))
for k, v := range stateValueMap {
params[k] = v
}
for k, v := range n.params {
params[k] = v
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if resourceExists && s != nil && len(n.params) > 0 { | |
| stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s) | |
| if err == nil && len(stateValueMap) > 0 { | |
| params = make(map[string]any, len(stateValueMap)+len(n.params)) | |
| for k, v := range stateValueMap { | |
| params[k] = v | |
| } | |
| for k, v := range n.params { | |
| params[k] = v | |
| } | |
| } | |
| } | |
| if resourceExists && s != nil && len(n.params) > 0 { | |
| stateValueMap, err := n.fromInstanceStateToJSONMapForDiff(s) | |
| if err != nil { | |
| n.logger.Debug("Failed to convert instance state to JSON map for diff; falling back to spec-only params", | |
| "error", err) | |
| } else if len(stateValueMap) > 0 { | |
| params = make(map[string]any, len(stateValueMap)+len(n.params)) | |
| for k, v := range stateValueMap { | |
| params[k] = v | |
| } | |
| for k, v := range n.params { | |
| params[k] = v | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/external_tfpluginsdk.go` around lines 444 - 455, The
conversion error from fromInstanceStateToJSONMapForDiff is being dropped because
err is scoped inside the inner if and never logged or returned, causing a silent
fallback to n.params; change the logic so the err is captured in the outer
scope, and if fromInstanceStateToJSONMapForDiff returns an error emit a
warning/debug log (using the controller's logger already in this file, e.g.,
r.logger or similar) that includes the error and context (resourceExists, n.Name
or similar), then proceed with the graceful fallback to params = n.params (or
alternatively return the error from the surrounding function/Observe if you
prefer surfacing failures); ensure you still merge stateValueMap into params
when no error occurs.
Description of your changes
This solves an issue we encountered when converting a (not-yet-opensourced sadly) provider for auth0 to using tfpluginsdk direct invocation.
terraform-provider-auth0, in its code paths for resource Updates (schema.Resource.UpdateContext) expects fully filled schemas. If there is a nil anywhere, it panics. We've seen this for arrays in particular.
We also mostly turn off LateInitialize for our resources, as we only want partial control of resources via Crossplane.
In the old Terraform CLI invocation mode, all worked well, because Terraform constructs a value for the provider based on the last observed state and then merging the desired config into it.
Contrary, upjet tfpluginsdk code did not do this: It constructed the values that are passed to the tf providers UpdateContext func purely based on the current
spec.forProvider. Which, without LateInit, is likely partial. This led to those panics.This MR is vibecoded, full discosure, and emulates TF behavior by constructing the value for the TF provider based on the last observed state, then applying the desired spec over it.
I have NOT looked at the tfpluginfw code if that fix is needed there as well
I have:
make reviewableto ensure this PR is ready for review.backport release-x.ylabels to auto-backport this PR if necessary.How has this code been tested
Using our own provider-auth0 with full tfpluginsdk usage with this PR in prod with hundreds of resources, most having LateInit off and partial .spec.forProviders