Skip to content

Add alerts to catch Knative TestGrid pods not running #1066

@michelle192837

Description

@michelle192837

Stuck in CrashLoopBackoff due to permissions issue reading the config, e.g.:

jsonPayload: {
error: "observe config: can't read "gs://knative-own-testgrid/config": open: Get "https://storage.googleapis.com/knative-own-testgrid/config": compute: Received 403 `Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
For more information, refer to the Workload Identity documentation:
	https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to

`"
file: "cmd/summarizer/main.go:151"
func: "main.main"
level: "error"
msg: "Could not summarize"
}

I ran https://github.com/GoogleCloudPlatform/testgrid/blob/master/cluster/bind-service-accounts.sh to see if any of the SAs need to be re-bound, and it seems like the answer was 'yes':

./bind-service-accounts.sh
Service accounts:
./canary/api.yaml:    iam.gke.io/gcp-service-account: [email protected]
./canary/api.yaml:  namespace: testgrid-canary
./canary/api.yaml:      serviceAccountName: api
./canary/config_merger.yaml:    iam.gke.io/gcp-service-account: [email protected]
./canary/config_merger.yaml:  namespace: testgrid-canary
./canary/config_merger.yaml:      serviceAccountName: config-merger
./canary/monitoring.yaml:  namespace: testgrid-canary
./canary/summarizer.yaml:    iam.gke.io/gcp-service-account: [email protected]
./canary/summarizer.yaml:  namespace: testgrid-canary
./canary/summarizer.yaml:      serviceAccountName: summarizer
./canary/tabulator.yaml:    iam.gke.io/gcp-service-account: [email protected]
./canary/tabulator.yaml:  namespace: testgrid-canary
./canary/tabulator.yaml:      serviceAccountName: tabulator
./canary/updater.yaml:    iam.gke.io/gcp-service-account: [email protected]
./canary/updater.yaml:  namespace: testgrid-canary
./canary/updater.yaml:      serviceAccountName: updater
./prod/config_merger.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/config_merger.yaml:  namespace: testgrid
./prod/config_merger.yaml:      serviceAccountName: config-merger
./prod/knative/summarizer.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/knative/summarizer.yaml:  namespace: knative
./prod/knative/summarizer.yaml:      serviceAccountName: summarizer
./prod/knative/tabulator.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/knative/tabulator.yaml:  namespace: knative
./prod/knative/tabulator.yaml:      serviceAccountName: tabulator
./prod/knative/updater.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/knative/updater.yaml:  namespace: knative
./prod/knative/updater.yaml:      serviceAccountName: updater
./prod/monitoring.yaml:  namespace: testgrid
./prod/README.md:1. Bind the service account(s) for the component in the `testgrid-canary` namespace:
./prod/README.md:1. Bind the service account(s) for the component in the `testgrid` namespace:
./prod/summarizer.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/summarizer.yaml:  namespace: testgrid
./prod/summarizer.yaml:      serviceAccountName: summarizer
./prod/tabulator.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/tabulator.yaml:  namespace: testgrid
./prod/tabulator.yaml:      serviceAccountName: tabulator
./prod/updater.yaml:    iam.gke.io/gcp-service-account: [email protected]
./prod/updater.yaml:  namespace: testgrid
./prod/updater.yaml:      serviceAccountName: updater
./setup.sh:echo -n 'testgrid namespace: ' >&2
NOOP: testgrid-canary/config-merger has workloadIdentityUser access to [email protected]
NOOP: testgrid-canary/summarizer has workloadIdentityUser access to [email protected]
NOOP: testgrid-canary/tabulator has workloadIdentityUser access to [email protected]
NOOP: testgrid-canary/updater has workloadIdentityUser access to [email protected]
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer] roles/iam.workloadIdentityUser access to [email protected]? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding [email protected] --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]'
Updated IAM policy for serviceAccount [[email protected]].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u1cNwo=
version: 1
DONE: gave knative/summarizer workloadIdentityUser access to [email protected]
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator] roles/iam.workloadIdentityUser access to [email protected]? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding [email protected] --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]'
Updated IAM policy for serviceAccount [[email protected]].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u2Rpkc=
version: 1
DONE: gave knative/tabulator workloadIdentityUser access to [email protected]
serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater] in serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]
Grant serviceAccount:k8s-testgrid.svc.id.goog[knative/updater] roles/iam.workloadIdentityUser access to [email protected]? [y/N] y
+ /usr/bin/gcloud iam service-accounts --project knative-tests add-iam-policy-binding [email protected] --role roles/iam.workloadIdentityUser --member 'serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]'
Updated IAM policy for serviceAccount [[email protected]].
bindings:
- members:
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/summarizer]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/tabulator]
  - serviceAccount:k8s-testgrid.svc.id.goog[knative/updater]
  - serviceAccount:knative-tests.svc.id.goog[test-pods/testgrid-updater]
  role: roles/iam.workloadIdentityUser
etag: BwXq2u4Lseg=
version: 1
DONE: gave knative/updater workloadIdentityUser access to [email protected]
NOOP: testgrid/config-merger has workloadIdentityUser access to [email protected]
NOOP: testgrid/summarizer has workloadIdentityUser access to [email protected]
NOOP: testgrid/tabulator has workloadIdentityUser access to [email protected]
NOOP: testgrid/updater has workloadIdentityUser access to [email protected]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions