Skip to content

Commit acb94fe

Browse files
authored
implement X-Ray tracing for end-to-end observability (#3)
* testing out X-Ray tracing * use dashes instead of underscores, run pre-commit and update README.md * fix README.md instructions for demo; fix paths * instrument cmr_query * instrument tracing in evaluator; add payload as annotation * fix bug * put annotation differently * enable active tracing on SNS * use xray_recorder for tracing evaluator functions * Update README.md * capture trace using decorator
1 parent 3f7ff60 commit acb94fe

31 files changed

+626
-113
lines changed

README.md

Lines changed: 74 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,13 @@ This guide provides a quick way to get started with our project. Please see our
230230
cd unity-initiator/terraform-unity/initiator/
231231
```
232232

233-
1. Copy a sample router configuration YAML file to use for deployment and update the AWS region and AWS account ID to match your AWS environment. We will be using the NISAR TLM test case for this demo so we also rename the SNS topic ARN for it accordingly:
233+
1. You will need an S3 bucket for terraform to stage the router Lambda zip file and router configuration YAML file during deployment. Create one or reuse an existing one and set an environment variable for it:
234+
235+
```
236+
export CODE_BUCKET=<some S3 bucket name>
237+
```
238+
239+
1. Copy a sample router configuration YAML file to use for deployment and update the AWS region and AWS account ID to match your AWS environment. We will be using the NISAR TLM test case for this demo so we also rename the SNS topic ARN for it accordingly. We then upload the router configuration file:
234240

235241
```
236242
cp ../../tests/resources/test_router.yaml .
@@ -239,18 +245,7 @@ This guide provides a quick way to get started with our project. Please see our
239245
sed -i "s/hilo-hawaii-1/${AWS_REGION}/g" test_router.yaml
240246
sed -i "s/123456789012:eval_nisar_ingest/${AWS_ACCOUNT_ID}:uod-dev-eval_nisar_ingest-evaluator_topic/g" test_router.yaml
241247
sed -i "s/123456789012:eval_airs_ingest/${AWS_ACCOUNT_ID}:uod-dev-eval_airs_ingest-evaluator_topic/g" test_router.yaml
242-
```
243-
244-
1. You will need an S3 bucket for terraform to stage the router Lambda zip file during deployment. Create one or reuse an existing one and set an environment variable for it:
245-
246-
```
247-
export CODE_BUCKET=<some S3 bucket name>
248-
```
249-
250-
1. You will need an S3 bucket to store the router configuration YAML file. Create one or reuse an existing one (could be the same one in the previous step) and set an environment variable for it:
251-
252-
```
253-
export CONFIG_BUCKET=<some S3 bucket name>
248+
aws s3 cp test_router.yaml s3://${CODE_BUCKET}/test_router.yaml
254249
```
255250

256251
1. Set a project name:
@@ -271,25 +266,36 @@ This guide provides a quick way to get started with our project. Please see our
271266
terraform apply \
272267
--var project=${PROJECT} \
273268
--var code_bucket=${CODE_BUCKET} \
274-
--var config_bucket=${CONFIG_BUCKET} \
275-
--var router_config=test_router.yaml \
269+
--var router_config=s3://${CODE_BUCKET}/test_router.yaml \
276270
-auto-approve
277271
```
278272

279273
**Take note of the `initiator_topic_arn` that is output by terraform. It will be used when setting up any triggers.**
280274

281-
#### Deploying an Example Evaluator (SNS topic->SQS queue->Lambda)
275+
#### Deploying Example Evaluators (SNS topic->SQS queue->Lambda)
282276

283-
1. Change directory to the location of the sns_sqs_lambda evaluator terraform:
277+
In this demo we will deploy 2 evaluators:
284278

279+
1. `eval_nisar_ingest` - evaluate ingestion of NISAR telemetry files deposited into the ISL bucket
280+
281+
1. `eval_airs_ingest` - evaluate ingestion of AIRS RetStd files returned by a periodic CMR query
282+
283+
##### Evaluator Deployment for NISAR TLM (via staged data to the ISL)
284+
1. Change directory to the location of the evaluators terraform:
285285
```
286-
cp -rp sns_sqs_lambda sns_sqs_lambda-nisar_tlm
286+
cd ../evaluators
287+
```
288+
289+
1. Make a copy of the `sns_sqs_lambda` directory for the NISAR TLM evaluator:
290+
291+
```
292+
cp -rp sns-sqs-lambda sns-sqs-lambda-nisar-tlm
287293
```
288294

289295
1. Change directory into the NISAR TLM evaluator terraform:
290296

291297
```
292-
cd sns_sqs_lambda-nisar_tlm/
298+
cd sns-sqs-lambda-nisar-tlm/
293299
```
294300

295301
1. Set the name of the evaluator to our NISAR example:
@@ -301,7 +307,7 @@ This guide provides a quick way to get started with our project. Please see our
301307
1. Note the implementation of the evaluator code. It currently doesn't do any real evaluation but simply returns that evaluation was successful:
302308

303309
```
304-
cat data.tf
310+
cat lambda_handler.py
305311
```
306312

307313
1. Initialize terraform:
@@ -315,17 +321,59 @@ This guide provides a quick way to get started with our project. Please see our
315321
```
316322
terraform apply \
317323
--var evaluator_name=${EVALUATOR_NAME} \
324+
--var code_bucket=${CODE_BUCKET} \
318325
-auto-approve
319326
```
320327

321328
**Take note of the `evaluator_topic_arn` that is output by terraform. It should match the topic ARN in the test_router.yaml file you used during the initiator deployment. If they match then the router Lambda is now able to submit payloads to this evaluator SNS topic.**
322329

330+
##### Evaluator Deployment for AIRS RetStd (via scheduled CMR query)
331+
1. Change directory to the location of the evaluators terraform:
332+
```
333+
cd ..
334+
```
335+
336+
1. Make a copy of the `sns_sqs_lambda` directory for the AIRS RetStd evaluator:
337+
```
338+
cp -rp sns-sqs-lambda sns-sqs-lambda-airs-retstd
339+
```
340+
341+
1. Change directory into the AIRS RetStd evaluator terraform:
342+
```
343+
cd sns-sqs-lambda-airs-retstd/
344+
```
345+
346+
1. Set the name of the evaluator to our AIRS example:
347+
```
348+
export EVALUATOR_NAME=eval_airs_ingest
349+
```
350+
351+
1. Note the implementation of the evaluator code. It currently doesn't do any real evaluation but simply returns that evaluation was successful:
352+
```
353+
cat lambda_handler.py
354+
```
355+
356+
1. Initialize terraform:
357+
```
358+
terraform init
359+
```
360+
361+
1. Run terraform apply:
362+
```
363+
terraform apply \
364+
--var evaluator_name=${EVALUATOR_NAME} \
365+
--var code_bucket=${CODE_BUCKET} \
366+
-auto-approve
367+
```
368+
369+
**Take note of the `evaluator_topic_arn` that is output by terraform. It should match the respective topic ARN in the test_router.yaml file you used during the initiator deployment. If they match then the router Lambda is now able to submit payloads to this evaluator SNS topic.**
370+
323371
#### Deploying an S3 Event Notification Trigger
324372

325-
1. Change directory to the location of the s3_bucket_notification trigger terraform:
373+
1. Change directory to the location of the s3-bucket-notification trigger terraform:
326374

327375
```
328-
cd ../../triggers/s3_bucket_notification/
376+
cd ../../triggers/s3-bucket-notification/
329377
```
330378

331379
1. You will need an S3 bucket to configure event notification on. Create one or reuse an existing one (could be the same one in the previous steps) and set an environment variable for it:
@@ -382,10 +430,10 @@ This guide provides a quick way to get started with our project. Please see our
382430

383431
#### Deploying an EventBridge Scheduler Trigger
384432

385-
1. Change directory to the location of the s3_bucket_notification trigger terraform:
433+
1. Change directory to the location of the scheduled-task trigger terraform:
386434

387435
```
388-
cd ../scheduled_task/
436+
cd ../scheduled-task/
389437
```
390438

391439
1. Note the implementation of the trigger lambda code. It currently hard codes a payload URL however in a real implementation, code would be written to query for new files from some REST API, database, etc. Here we simulate that and simply return a NISAR TLM file:
@@ -416,10 +464,10 @@ This guide provides a quick way to get started with our project. Please see our
416464

417465
#### Deploying an EventBridge Scheduler Trigger for Periodic CMR Queries
418466

419-
1. Change directory to the location of the s3_bucket_notification trigger terraform:
467+
1. Change directory to the location of the cmr-query trigger terraform:
420468

421469
```
422-
cd ../cmr_query/
470+
cd ../cmr-query/
423471
```
424472

425473
1. Note the implementation of the trigger lambda code. It will query CMR for granules for a particular collection within a timeframe, query its dynamodb table if they already exist, and if not, submit them as payload URLs to the initiator SNS topic and save them into the dynamodb table:

scripts/build_cmr_query_lambda_package.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ BASE_PATH=$(dirname "${BASH_SOURCE}")
33
BASE_PATH=$(cd "${BASE_PATH}/.."; pwd)
44
DIST_DIR=${BASE_PATH}/dist
55
PKG_DIR=${DIST_DIR}/lambda_packages
6-
CMR_QUERY_DIR=${BASE_PATH}/terraform-unity/triggers/cmr_query
6+
CMR_QUERY_DIR=${BASE_PATH}/terraform-unity/triggers/cmr-query
77

88
set -ex
99

@@ -15,6 +15,7 @@ VERSION=$(hatch run python -c 'from importlib.metadata import version; print(ver
1515
echo "{\"version\": \"$VERSION\"}" > ${DIST_DIR}/version.json
1616
mkdir -p $PKG_DIR
1717
pip install -t $PKG_DIR ${DIST_DIR}/unity_initiator-*.whl
18+
pip install -t $PKG_DIR aws_xray_sdk
1819
pip install -t $PKG_DIR python_cmr
1920
cp ${CMR_QUERY_DIR}/lambda_handler.py $PKG_DIR/
2021
cd $PKG_DIR
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash
2+
BASE_PATH=$(dirname "${BASH_SOURCE}")
3+
BASE_PATH=$(cd "${BASE_PATH}/.."; pwd)
4+
DIST_DIR=${BASE_PATH}/dist
5+
PKG_DIR=${DIST_DIR}/lambda_packages
6+
SCHED_TASK_DIR=${BASE_PATH}/terraform-unity/triggers/scheduled-task-instrumented
7+
8+
set -ex
9+
10+
rm -rf $DIST_DIR
11+
pip install hatch
12+
hatch clean
13+
hatch build
14+
VERSION=$(hatch run python -c 'from importlib.metadata import version; print(version("unity_initiator"))')
15+
echo "{\"version\": \"$VERSION\"}" > ${DIST_DIR}/version.json
16+
mkdir -p $PKG_DIR
17+
pip install -t $PKG_DIR ${DIST_DIR}/unity_initiator-*.whl
18+
pip install -t $PKG_DIR aws_xray_sdk
19+
cp ${SCHED_TASK_DIR}/lambda_handler.py $PKG_DIR/
20+
cd $PKG_DIR
21+
zip -rq ${DIST_DIR}/scheduled_task-${VERSION}-lambda.zip .

src/unity_initiator/cloud/lambda_handler.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,19 @@
44
from tempfile import mkstemp
55

66
import smart_open
7+
from aws_xray_sdk.core import patch_all, xray_recorder
78

89
from ..router import Router
910
from ..utils.logger import logger
1011

12+
# initialize the AWS X-Ray SDK
13+
patch_all()
14+
15+
1116
ROUTER = None
1217

1318

19+
@xray_recorder.capture("lambda_handler_base")
1420
def lambda_handler_base(event, context):
1521
"""Base lambda handler that instantiates a router, globally, and executes actions for a single payload."""
1622

@@ -35,6 +41,7 @@ def lambda_handler_base(event, context):
3541
f.write(router_cfg)
3642
ROUTER = Router(router_file)
3743
os.unlink(router_file)
44+
xray_recorder.put_annotation("payload", event["payload"])
3845
return ROUTER.execute_actions(event["payload"])
3946

4047

terraform-unity/evaluators/sns-sqs-lambda/README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@
1515

1616
| Name | Version |
1717
|------|---------|
18-
| <a name="provider_archive"></a> [archive](#provider\_archive) | 2.4.2 |
1918
| <a name="provider_aws"></a> [aws](#provider\_aws) | 5.51.1 |
19+
| <a name="provider_local"></a> [local](#provider\_local) | 2.5.1 |
20+
| <a name="provider_null"></a> [null](#provider\_null) | 3.2.2 |
2021

2122
## Modules
2223

@@ -29,25 +30,29 @@ No modules.
2930
| [aws_cloudwatch_log_group.evaluator_lambda_log_group](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group) | resource |
3031
| [aws_iam_policy.evaluator_lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
3132
| [aws_iam_role.evaluator_lambda_iam_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
33+
| [aws_iam_role_policy_attachment.aws_xray_write_only_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
3234
| [aws_iam_role_policy_attachment.lambda_base_policy_attachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
3335
| [aws_iam_role_policy_attachment.lambda_policy_attachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
3436
| [aws_lambda_event_source_mapping.evaluator_queue_event_source_mapping](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_event_source_mapping) | resource |
3537
| [aws_lambda_function.evaluator_lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource |
38+
| [aws_s3_object.lambda_package](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource |
3639
| [aws_sns_topic.evaluator_topic](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic) | resource |
3740
| [aws_sns_topic_policy.evaluator_topic_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_policy) | resource |
3841
| [aws_sns_topic_subscription.evaluator_subscription](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_subscription) | resource |
3942
| [aws_sqs_queue.evaluator_dead_letter_queue](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue) | resource |
4043
| [aws_sqs_queue.evaluator_queue](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue) | resource |
4144
| [aws_sqs_queue_policy.evaluator_queue_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue_policy) | resource |
4245
| [aws_ssm_parameter.evaluator_lambda_function_name](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ssm_parameter) | resource |
43-
| [archive_file.evaluator_lambda_artifact](https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/file) | data source |
46+
| [null_resource.build_lambda_package](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
4447
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
4548
| [aws_iam_policy.mcp_operator_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy) | data source |
49+
| [local_file.version](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source |
4650

4751
## Inputs
4852

4953
| Name | Description | Type | Default | Required |
5054
|------|-------------|------|---------|:--------:|
55+
| <a name="input_code_bucket"></a> [code\_bucket](#input\_code\_bucket) | The S3 bucket where lambda zip files will be stored and accessed | `string` | n/a | yes |
5156
| <a name="input_evaluator_name"></a> [evaluator\_name](#input\_evaluator\_name) | The evaluator name | `string` | n/a | yes |
5257
| <a name="input_project"></a> [project](#input\_project) | The unity project its installed into | `string` | `"uod"` | no |
5358
| <a name="input_venue"></a> [venue](#input\_venue) | The unity venue its installed into | `string` | `"dev"` | no |
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/bin/bash
2+
BASE_PATH=$(dirname "${BASH_SOURCE}")
3+
BASE_PATH=$(cd "${BASE_PATH}/../../.."; pwd)
4+
DIST_DIR=${BASE_PATH}/dist
5+
PKG_DIR=${DIST_DIR}/lambda_packages
6+
EVALUATOR_DIR=$(dirname "${BASH_SOURCE}")
7+
EVALUATOR_DIR=$(cd "${EVALUATOR_DIR}"; pwd)
8+
EVALUATOR_NAME=$1
9+
10+
set -ex
11+
12+
rm -rf $DIST_DIR
13+
pip install hatch
14+
hatch clean
15+
hatch build
16+
VERSION=$(hatch run python -c 'from importlib.metadata import version; print(version("unity_initiator"))')
17+
echo "{\"version\": \"$VERSION\"}" > ${DIST_DIR}/version.json
18+
mkdir -p $PKG_DIR
19+
pip install -t $PKG_DIR ${DIST_DIR}/unity_initiator-*.whl
20+
pip install -t $PKG_DIR aws_xray_sdk
21+
cp ${EVALUATOR_DIR}/lambda_handler.py $PKG_DIR/
22+
cd $PKG_DIR
23+
zip -rq ${DIST_DIR}/${EVALUATOR_NAME}-${VERSION}-lambda.zip .

terraform-unity/evaluators/sns-sqs-lambda/data.tf

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,7 @@ data "aws_iam_policy" "mcp_operator_policy" {
44
name = "mcp-tenantOperator-AMI-APIG"
55
}
66

7-
data "archive_file" "evaluator_lambda_artifact" {
8-
type = "zip"
9-
output_path = "${path.root}/.archive_files/${var.evaluator_name}-evaluator_lambda.zip"
10-
11-
source {
12-
filename = "lambda_function.py"
13-
content = <<CODE
14-
def lambda_handler(event, context):
15-
print(f"event: {event}")
16-
print(f"context: {context}")
17-
18-
# implement your adaptation-specific evaluator code here and return
19-
# True if it successfully evaluates. False otherwise.
20-
21-
return { "success": True }
22-
CODE
23-
}
7+
data "local_file" "version" {
8+
filename = "${path.module}/../../../dist/version.json"
9+
depends_on = [null_resource.build_lambda_package]
2410
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import json
2+
3+
from aws_xray_sdk.core import patch_all, xray_recorder
4+
5+
from unity_initiator.utils.logger import logger
6+
7+
patch_all()
8+
9+
10+
def perform_evaluation(event, context):
11+
logger.info("event: %s", json.dumps(event, indent=2))
12+
logger.info("context: %s", context)
13+
14+
# Implement your adaptation-specific evaluator code here and return
15+
# True if it successfully evaluates. False otherwise.
16+
17+
return True
18+
19+
20+
def lambda_handler(event, context):
21+
with xray_recorder.capture(context.function_name):
22+
return {"success": perform_evaluation(event, context)}

0 commit comments

Comments
 (0)