Skip to content

Long k8s-file logs are truncated on 'podman logs --follow' output of stopped container #21914

@UniversalSuperBox

Description

@UniversalSuperBox

Issue Description

When a container using the k8s-file log driver outputs a lot of logs, retrieving them with podman logs --follow is unreliable. If you're lucky, the entire log file is output successfully. If you're not lucky, the log is truncated at some unspecified place depending on the speed of your hardware.

Retrieving logs without --follow is unaffected. Retrieving logs from a running container is unaffected.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Run a container that outputs a lot of logs with the k8s-file log driver. Allow it to stop completely.
  2. Run podman logs --follow on the container

I've provided a reproducer at https://gist.github.com/UniversalSuperBox/031f34ecb51c5bd04cb559542d9ba519. You can run 1test.sh and have it run podman logs until it fails, or just run podman logs --follow --latest | tail -n 1:

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
scelerisque semper tortor. Sed commodo libero id posuere facilisis. Cras
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
torquent per conubia nostra, per inceptos himenaeos. Nullam elementum
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
Morbi id elit ultrices, pulvinar odio sit amet, elementum lectus.
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
leo ac rutrum ullamcorper. Phasellus ut ultricies ligula. Phasellus

Describe the results you received

The length of the log differs on every run of podman logs --follow

Describe the results you expected

The entire log is output.

podman info output

host:
  arch: arm64
  buildahVersion: 1.33.5
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc39.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 98.99
    systemPercent: 0.45
    userPercent: 0.56
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "39"
  eventLogger: journald
  freeLocks: 2017
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 502
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.7.5-200.fc39.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 2676797440
  memTotal: 4084744192
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc39.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.fc39.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.fc39.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/502/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc39.aarch64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc39.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/502/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 50m 30.00s
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 0
    stopped: 5
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 13598543872
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 62
  runRoot: /run/user/502/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.3
  Built: 1708357248
  BuiltTime: Mon Feb 19 09:40:48 2024
  GitCommit: ""
  GoVersion: go1.21.7
  Os: linux
  OsArch: linux/arm64
  Version: 4.9.3

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

The issue occurs regardless of rootfulness.

This issue occurs whenever the log size gets above a certain point, that point seems dependent on hardware. Seems to be a race condition where the stop point is based on a certain time spent in the log reader, not an amount of data output. The issue occurs over the API, too, so you can use curl with the --limit-rate command to test that. When the rate is smaller, the amount of text downloaded is also smaller.

Command for flavor, overwrites binary-output2.bin in the current directory:

curl \
    --output 'binary-output2.bin' \
    --no-buffer \
    -XGET \
    --limit-rate=1000K
    --unix-socket "/run/user/${uid}/podman/podman.sock" \
    "http://localhost/v5.0.0/libpod/containers/${container}/logs?follow=True&stderr=True&stdout=True"

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions