Skip to content

container input: CRI partial line reassembly ignores max_bytes, causing OOM #49259

@raychinov

Description

@raychinov

Filebeat version

8.19.12

Operating system and version

Linux (Kubernetes with containerd runtime)

Discuss Forum URL

discuss.elastic.co/t/container-input-cri-partial-line-reassembly-ignores-max-bytes-causing-oom

Description of the problem including expected versus actual behavior

When a container log contains many consecutive CRI partial (P) lines, DockerJSONReader.Next() assembles all chunks into a single message in memory with no size limit. The max_bytes setting only takes effect at the LimitReader layer, which wraps around the DockerJSONReader — so the full message is already allocated before truncation happens.

This causes filebeat to OOM when processing container logs from applications that write long lines without newlines (e.g. terminal progress bars using \r).

In our case, the Ruby progress bar writes to stdout using \r (carriage return) to overwrite the same line. Since there's never a \n, the entire ~60 minutes of progress bar output is one logical line from the container runtime's perspective. Containerd splits this into ~1000 partial (P) chunks of ~16-24KB each, and filebeat's DockerJSONReader tries to reassemble them all into one message.

Expected: Partial line reassembly should respect max_bytes and stop appending once the limit is reached, similar to how #19552 added bounds to the EncodeReader layer.

Actual: The reassembly loop in libbeat/reader/readjson/docker_json.go appends without any limit:

for p.partial && logLine.Partial {
    next, err := p.reader.Next()
    message.Content = append(message.Content, next.Content...)  // no size check
}

The reader chain order means max_bytes is enforced too late:

EncodeReader (max_bytes*4) → DockerJSONReader (NO LIMIT) → StripNewline → LimitReader (max_bytes)

Steps to reproduce

# Generate a CRI log with 1000 partial lines (~65KB each, ~63MB logical line)
mkdir -p /tmp/fb-oom-test/containers /tmp/fb-oom-test/registry
CHUNK=$(head -c 65000 /dev/zero | tr '\0' 'A')
{
  echo '2024-01-01T00:00:00.000000000Z stdout F {"message":"normal"}'
  for i in $(seq 1 1000); do
    printf '2024-01-01T00:00:00.%09dZ stdout P %s\n' "$i" "$CHUNK"
  done
  echo '2024-01-01T00:01:00.000000000Z stdout F end'
} > /tmp/fb-oom-test/containers/test_default_main-abcdef123456.log

# Create minimal filebeat config
cat > /tmp/fb-oom-test/filebeat.yml << 'CONF'
filebeat.inputs:
  - type: container
    max_bytes: 65536
    paths:
      - /var/log/containers/*.log
output.file:
  path: /tmp
  filename: fb-out
  codec.format:
    string: ''
setup:
  ilm.enabled: false
  template.enabled: false
logging:
  level: warning
  to_stderr: true
CONF

# Run filebeat with 64MB memory limit — exits 137 (OOM killed)
docker run --rm --name fb-oom-test \
  --network=none --memory=64m --user root \
  -v /tmp/fb-oom-test/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro \
  -v /tmp/fb-oom-test/containers:/var/log/containers:ro \
  -v /tmp/fb-oom-test/registry:/usr/share/filebeat/data \
  docker.elastic.co/beats/filebeat:8.19.12 \
  filebeat -e --strict.perms=false

echo "exit code: $?"  # 137 = OOM killed

Changing all stdout P to stdout F in the same log file allows filebeat to process it without OOM — confirming the issue is specifically in partial line reassembly.

Configuration

filebeat.inputs:
  - type: container
    max_bytes: 65536
    paths:
      - /var/log/containers/*.log

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions