Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@
- [Remap timestamps for historical logs](#remap-timestamps-for-historical-logs)
- [Extract a field from the Datadog tags array (`ddtags`)](#extract-a-field-from-the-datadog-tags-array)
- [Reference another field's value](#reference-another-fields-value)
- [Remove attributes containing null values](#remove-attributes-containing-null-values)
- [Merge nested attributes to root level](#merge-nested-attributes-to-root-level)
- [Serialize outbound logs in _raw format](#serialize-outbound-logs-in-_raw-format)

## Decode Base64

Expand Down Expand Up @@ -311,6 +314,160 @@
}
```

## Remove attributes containing null values

Attributes with null or empty values can add unnecessary bloat to your logs. Remove null values to trim the log and only send attributes that provide information. In the script below, the `empty_patterns` section contains the list of empty patterns to check for in your logs. You can add and remove patterns to fit your use case.

```
# Define your empty patterns
empty_patterns = ["null", "NULL", "N/A", "n/a", "none", "NONE", "-", "undefined"]

# Apply generic cleanup
. = compact(map_values(., recursive: true) -> |v| {
if is_null(v) ||
includes(empty_patterns, v) ||
(is_string(v) && strip_whitespace!(v) == "") ||
(is_array(v) && length!(v) == 0) ||
(is_object(v) && length!(v) == 0) {
null
} else {
v
}
})
```

## Merge nested attributes to root level

Targeting nested objects or fields in a filter query may require you to define multiple paths. This is common when working with the message field, where the resulting parsed contents are nested in an object. When you use the Observability Pipelines' filter syntax, accessing a nested field requires the <OUTER_PATH>.<INNER_PATH> notation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Targeting nested objects or fields in a filter query may require you to define multiple paths. This is common when working with the message field, where the resulting parsed contents are nested in an object. When you use the Observability Pipelines' filter syntax, accessing a nested field requires the <OUTER_PATH>.<INNER_PATH> notation.
Targeting nested objects or fields in a filter query may require you to define multiple paths. This is common when working with the message field, where the resulting parsed contents are nested in an object. When you use the Observability Pipelines' filter syntax, accessing a nested field requires the `<OUTER_PATH>.<INNER_PATH>` notation.


For example, this log contains a stringified JSON message:

```json
{
"level": "info",
"message": "{\"event_type\":\"user_login\",\"result\":\"success\",\"login_method\":\"oauth\",\"user_agent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\",\"ip_address\":\"192.168.1.100\",\"session_id\":\"sess_abc123xyz\",\"duration_ms\":245}",
"timestamp": "2019-03-12T11:30:00Z",
"processed_ts": "2025-05-22T14:30:00Z",
"user_id": "12345",
"app_id": "streaming-services",
"ddtags": [
"kube_service:my-service",
"k8_deployment:your-host",
"kube_cronjob:myjob"
]
}
```

This is the output after the `message` field has been parsed. The parsed content is nested in the `message` object.

```
{
"app_id": "streaming-services",
"ddtags": [
"kube_service:my-service",
"k8_deployment:your-host",
"kube_cronjob:myjob"
],
"level": "info",
"message": {
"duration_ms": 245,
"event_type": "user_login",
"ip_address": "192.168.1.100",
"login_method": "oauth",
"result": "success",
"session_id": "sess_abc123xyz",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
},
"processed_ts": "2025-05-22T14:30:00Z",
"timestamp": "2019-03-12T11:30:00Z",
"user_id": "12345"
}
```
In this case, to filter for `event_type`, you need to specify` @message.event_type`. While that works, it can be difficult to do so at scale. Therefore, Datadog recommends flattening the object to the root level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this case, to filter for `event_type`, you need to specify` @message.event_type`. While that works, it can be difficult to do so at scale. Therefore, Datadog recommends flattening the object to the root level.
In this case, to filter for `event_type`, you need to specify `@message.event_type`. While that works, it can be difficult to do so at scale. Therefore, Datadog recommends flattening the object to the root level.


In order to merge the events from the `message` object to root level, use this script:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to merge the events from the `message` object to root level, use this script:
To merge the events from the `message` object to root level, use this script:


```
if is_object(.message) {
. = merge!(., .message)
del(.message)
}
```

**Note**: This script works with any JSON object. You just need to replace the `message` attribute with the name of the field you are trying to flatten.

Check warning on line 397 in content/en/observability_pipelines/guide/get_started_with_the_custom_processor.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use '' instead of 'just'.

This results in the log with flattened attributes that you can filter directly:

```
{
"app_id": "streaming-services",
"ddtags": [
"kube_service:my-service",
"k8_deployment:your-host",
"kube_cronjob:myjob"
],
"duration_ms": 245,
"event_type": "user_login",
"ip_address": "192.168.1.100",
"level": "info",
"login_method": "oauth",
"processed_ts": "2025-05-22T14:30:00Z",
"result": "success",
"session_id": "sess_abc123xyz",
"timestamp": "2019-03-12T11:30:00Z",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"user_id": "12345"
}
```

**Note**: If you flatten the message field, the resulting log no longer has a message object. This means if the log is sent to Datadog, when you view the log in Log Explorer, you will not see a Log Message section in the log side panel.

Check warning on line 423 in content/en/observability_pipelines/guide/get_started_with_the_custom_processor.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.tense

Avoid temporal words like 'will'.

Check notice on line 423 in content/en/observability_pipelines/guide/get_started_with_the_custom_processor.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

## Serialize outbound logs in _raw format

Splunk and CrowdStrike prefer a format called `_raw` for log ingestion. Sending data in `_raw` normalizes your logs and allows you to benefit from their out-of-the-box dashboards, monitors, and threat detection content. To ensure the `_raw` log format gets applied, you can serialize the outbound event in `_raw`.

**Notes**:
- You should add other processing, remapping, and parsing steps before serializing your logs in `_raw` format.
- Select `Raw` as the encoding option when you set up the Splunk HEC or Crowdstrike destination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Select `Raw` as the encoding option when you set up the Splunk HEC or Crowdstrike destination.
- Select `Raw` as the encoding option when you set up the Splunk HEC or CrowdStrike destination.


An example input log:

```
{
"app_id": "streaming-services",
"level": "info",
"message": {
"duration_ms": 245,
"event_type": "user_login",
"ip_address": "192.168.1.100",
"login_method": "oauth",
"result": "success",
"session_id": "sess_abc123xyz",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
},
"processed_ts": "2025-05-22T14:30:00Z",
"timestamp": "2019-03-12T11:30:00Z",
"user_id": "12345"
}
```

This custom function serializes the event into `_raw` format:

```
# Serialize the entire event into _raw
._raw = encode_key_value(.)
# Only keep _raw
. = { "_raw": ._raw }
```

This is the output of the example log after it's been processed by the custom script:

```
{
"_raw": "app_id=streaming-services level=info message.duration_ms=245 message.event_type=user_login message.ip_address=192.168.1.100 message.login_method=oauth message.result=success message.session_id=sess_abc123xyz message.user_agent=\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\" processed_ts=2025-05-22T14:30:00Z timestamp=2019-03-12T11:30:00Z user_id=12345"
}
```

## Further reading

{{< partial name="whats-next/whats-next.html" >}}
Expand Down
Loading