Skip to content

Simultaneous write to same file issues, writes completed, but missing data, ganesha, glusterfs #4605

@todeb

Description

@todeb

Hello, created 3 pods that doing writes to same file via nfs-ganesha fsal to gluster cluster.
Writes finishing with no errors, but when inspecting the file lot of writes are missing seems like in some random pattern.

glusterfs: 11.1
nfs-ganesha: 4.3

volume options:

Option                                   Value
------                                   -----
cluster.lookup-unhashed                  on (DEFAULT)
cluster.lookup-optimize                  on (DEFAULT)
cluster.rmdir-optimize                   on (DEFAULT)
cluster.min-free-disk                    10% (DEFAULT)
cluster.min-free-inodes                  5% (DEFAULT)
cluster.rebalance-stats                  off (DEFAULT)
cluster.subvols-per-directory            (null) (DEFAULT)
cluster.readdir-optimize                 off (DEFAULT)
cluster.rsync-hash-regex                 (null) (DEFAULT)
cluster.extra-hash-regex                 (null) (DEFAULT)
cluster.dht-xattr-name                   trusted.glusterfs.dht (DEFAULT)
cluster.randomize-hash-range-by-gfid     off (DEFAULT)
cluster.rebal-throttle                   normal (DEFAULT)
cluster.lock-migration                   off
cluster.force-migration                  off
cluster.local-volume-name                (null) (DEFAULT)
cluster.weighted-rebalance               on (DEFAULT)
cluster.switch-pattern                   (null) (DEFAULT)
cluster.entry-change-log                 on (DEFAULT)
cluster.read-subvolume                   (null) (DEFAULT)
cluster.read-subvolume-index             -1 (DEFAULT)
cluster.read-hash-mode                   1 (DEFAULT)
cluster.background-self-heal-count       8 (DEFAULT)
cluster.metadata-self-heal               off (DEFAULT)
cluster.data-self-heal                   off (DEFAULT)
cluster.entry-self-heal                  off (DEFAULT)
cluster.self-heal-daemon                 on (DEFAULT)
cluster.heal-timeout                     600 (DEFAULT)
cluster.self-heal-window-size            8 (DEFAULT)
cluster.data-change-log                  on (DEFAULT)
cluster.metadata-change-log              on (DEFAULT)
cluster.data-self-heal-algorithm         (null) (DEFAULT)
cluster.eager-lock                       on (DEFAULT)
disperse.eager-lock                      on (DEFAULT)
disperse.other-eager-lock                on (DEFAULT)
disperse.eager-lock-timeout              1 (DEFAULT)
disperse.other-eager-lock-timeout        1 (DEFAULT)
cluster.quorum-type                      auto
cluster.quorum-count                     (null) (DEFAULT)
cluster.choose-local                     true (DEFAULT)
cluster.self-heal-readdir-size           1KB (DEFAULT)
cluster.post-op-delay-secs               1 (DEFAULT)
cluster.ensure-durability                on (DEFAULT)
cluster.consistent-metadata              no (DEFAULT)
cluster.heal-wait-queue-length           128 (DEFAULT)
cluster.favorite-child-policy            none (DEFAULT)
cluster.full-lock                        yes (DEFAULT)
cluster.optimistic-change-log            on (DEFAULT)
diagnostics.latency-measurement          off
diagnostics.dump-fd-stats                off (DEFAULT)
diagnostics.count-fop-hits               off
diagnostics.brick-log-level              INFO
diagnostics.client-log-level             INFO
diagnostics.brick-sys-log-level          CRITICAL (DEFAULT)
diagnostics.client-sys-log-level         CRITICAL (DEFAULT)
diagnostics.brick-logger                 (null) (DEFAULT)
diagnostics.client-logger                (null) (DEFAULT)
diagnostics.brick-log-format             (null) (DEFAULT)
diagnostics.client-log-format            (null) (DEFAULT)
diagnostics.brick-log-buf-size           5 (DEFAULT)
diagnostics.client-log-buf-size          5 (DEFAULT)
diagnostics.brick-log-flush-timeout      120 (DEFAULT)
diagnostics.client-log-flush-timeout     120 (DEFAULT)
diagnostics.stats-dump-interval          0 (DEFAULT)
diagnostics.fop-sample-interval          0 (DEFAULT)
diagnostics.stats-dump-format            json (DEFAULT)
diagnostics.fop-sample-buf-size          65535 (DEFAULT)
diagnostics.stats-dnscache-ttl-sec       86400 (DEFAULT)
performance.cache-max-file-size          0 (DEFAULT)
performance.cache-min-file-size          0 (DEFAULT)
performance.cache-refresh-timeout        1 (DEFAULT)
performance.cache-priority                (DEFAULT)
performance.io-cache-size                32MB (DEFAULT)
performance.cache-size                   32MB (DEFAULT)
performance.io-thread-count              16 (DEFAULT)
performance.high-prio-threads            16 (DEFAULT)
performance.normal-prio-threads          16 (DEFAULT)
performance.low-prio-threads             16 (DEFAULT)
performance.least-prio-threads           1 (DEFAULT)
performance.enable-least-priority        on (DEFAULT)
performance.iot-watchdog-secs            (null) (DEFAULT)
performance.iot-cleanup-disconnected-reqs off (DEFAULT)
performance.iot-pass-through             false (DEFAULT)
performance.io-cache-pass-through        false (DEFAULT)
performance.quick-read-cache-size        128MB (DEFAULT)
performance.cache-size                   128MB (DEFAULT)
performance.quick-read-cache-timeout     1 (DEFAULT)
performance.qr-cache-timeout             1 (DEFAULT)
performance.quick-read-cache-invalidation false (DEFAULT)
performance.ctime-invalidation           false (DEFAULT)
performance.flush-behind                 on (DEFAULT)
performance.nfs.flush-behind             off
performance.write-behind-window-size     1MB (DEFAULT)
performance.resync-failed-syncs-after-fsync off (DEFAULT)
performance.nfs.write-behind-window-size 1MB (DEFAULT)
performance.strict-o-direct              off (DEFAULT)
performance.nfs.strict-o-direct          on
performance.strict-write-ordering        off
performance.nfs.strict-write-ordering    on
performance.write-behind-trickling-writes off
performance.aggregate-size               128KB (DEFAULT)
performance.nfs.write-behind-trickling-writes off
performance.lazy-open                    off
performance.read-after-open              yes (DEFAULT)
performance.open-behind-pass-through     false (DEFAULT)
performance.read-ahead-page-count        4 (DEFAULT)
performance.read-ahead-pass-through      false (DEFAULT)
performance.readdir-ahead-pass-through   false (DEFAULT)
performance.md-cache-pass-through        false (DEFAULT)
performance.write-behind-pass-through    false (DEFAULT)
performance.md-cache-timeout             1 (DEFAULT)
performance.cache-swift-metadata         false (DEFAULT)
performance.cache-samba-metadata         false (DEFAULT)
performance.cache-capability-xattrs      true (DEFAULT)
performance.cache-ima-xattrs             true (DEFAULT)
performance.md-cache-statfs              off (DEFAULT)
performance.xattr-cache-list              (DEFAULT)
performance.nl-cache-pass-through        false (DEFAULT)
network.frame-timeout                    1800 (DEFAULT)
network.ping-timeout                     42 (DEFAULT)
network.tcp-window-size                  (null) (DEFAULT)
client.ssl                               off
network.remote-dio                       disable (DEFAULT)
client.event-threads                     2 (DEFAULT)
client.tcp-user-timeout                  0
client.keepalive-time                    20
client.keepalive-interval                2
client.keepalive-count                   9
client.strict-locks                      off
network.tcp-window-size                  (null) (DEFAULT)
network.inode-lru-limit                  16384 (DEFAULT)
auth.allow                               *
auth.reject                              (null) (DEFAULT)
transport.keepalive                      1
server.allow-insecure                    on (DEFAULT)
server.root-squash                       off (DEFAULT)
server.all-squash                        off (DEFAULT)
server.anonuid                           65534 (DEFAULT)
server.anongid                           65534 (DEFAULT)
server.statedump-path                    /var/run/gluster (DEFAULT)
server.outstanding-rpc-limit             64 (DEFAULT)
server.ssl                               off
auth.ssl-allow                           *
server.manage-gids                       off (DEFAULT)
server.dynamic-auth                      on (DEFAULT)
client.send-gids                         on (DEFAULT)
server.gid-timeout                       300 (DEFAULT)
server.own-thread                        (null) (DEFAULT)
server.event-threads                     2 (DEFAULT)
server.tcp-user-timeout                  42 (DEFAULT)
server.keepalive-time                    20
server.keepalive-interval                2
server.keepalive-count                   9
transport.listen-backlog                 1024
ssl.own-cert                             (null) (DEFAULT)
ssl.private-key                          (null) (DEFAULT)
ssl.ca-list                              (null) (DEFAULT)
ssl.crl-path                             (null) (DEFAULT)
ssl.certificate-depth                    (null) (DEFAULT)
ssl.cipher-list                          (null) (DEFAULT)
ssl.dh-param                             (null) (DEFAULT)
ssl.ec-curve                             (null) (DEFAULT)
transport.address-family                 inet
performance.write-behind                 off
performance.read-ahead                   off
performance.readdir-ahead                off
performance.io-cache                     off
performance.open-behind                  off
performance.quick-read                   off
performance.nl-cache                     off
performance.stat-prefetch                off
performance.client-io-threads            off
performance.nfs.write-behind             off
performance.nfs.read-ahead               off
performance.nfs.io-cache                 off
performance.nfs.quick-read               off
performance.nfs.stat-prefetch            off
performance.nfs.io-threads               off
performance.force-readdirp               true (DEFAULT)
performance.cache-invalidation           false (DEFAULT)
performance.global-cache-invalidation    true (DEFAULT)
features.uss                             off
features.snapshot-directory              .snaps
features.show-snapshot-directory         off
features.tag-namespaces                  off
network.compression                      off
network.compression.window-size          -15 (DEFAULT)
network.compression.mem-level            8 (DEFAULT)
network.compression.min-size             1024 (DEFAULT)
network.compression.compression-level    1 (DEFAULT)
network.compression.debug                false (DEFAULT)
features.default-soft-limit              80% (DEFAULT)
features.soft-timeout                    60 (DEFAULT)
features.hard-timeout                    5 (DEFAULT)
features.alert-time                      86400 (DEFAULT)
features.quota-deem-statfs               off
geo-replication.indexing                 off
geo-replication.indexing                 off
geo-replication.ignore-pid-check         off
geo-replication.ignore-pid-check         off
features.quota                           off
features.inode-quota                     off
features.bitrot                          disable
debug.trace                              off
debug.log-history                        no (DEFAULT)
debug.log-file                           no (DEFAULT)
debug.exclude-ops                        (null) (DEFAULT)
debug.include-ops                        (null) (DEFAULT)
debug.error-gen                          off
debug.error-failure                      (null) (DEFAULT)
debug.error-number                       (null) (DEFAULT)
debug.random-failure                     off (DEFAULT)
debug.error-fops                         (null) (DEFAULT)
features.read-only                       off (DEFAULT)
features.worm                            off
features.worm-file-level                 off
features.worm-files-deletable            on
features.default-retention-period        120 (DEFAULT)
features.retention-mode                  relax (DEFAULT)
features.auto-commit-period              180 (DEFAULT)
storage.linux-aio                        off (DEFAULT)
storage.linux-io_uring                   off (DEFAULT)
storage.batch-fsync-mode                 reverse-fsync (DEFAULT)
storage.batch-fsync-delay-usec           0 (DEFAULT)
storage.owner-uid                        -1 (DEFAULT)
storage.owner-gid                        -1 (DEFAULT)
storage.node-uuid-pathinfo               off (DEFAULT)
storage.health-check-interval            30 (DEFAULT)
storage.build-pgfid                      off (DEFAULT)
storage.gfid2path                        on (DEFAULT)
storage.gfid2path-separator              : (DEFAULT)
storage.reserve                          1 (DEFAULT)
storage.health-check-timeout             20 (DEFAULT)
storage.fips-mode-rchecksum              on
storage.force-create-mode                0000 (DEFAULT)
storage.force-directory-mode             0000 (DEFAULT)
storage.create-mask                      0777 (DEFAULT)
storage.create-directory-mask            0777 (DEFAULT)
storage.max-hardlinks                    100 (DEFAULT)
features.ctime                           on (DEFAULT)
config.gfproxyd                          off
cluster.server-quorum-type               off
cluster.server-quorum-ratio              51
changelog.changelog                      off (DEFAULT)
changelog.changelog-dir                  {{ brick.path }}/.glusterfs/changelogs (DEFAULT)
changelog.encoding                       ascii (DEFAULT)
changelog.rollover-time                  15 (DEFAULT)
changelog.fsync-interval                 5 (DEFAULT)
changelog.changelog-barrier-timeout      120
changelog.capture-del-path               off (DEFAULT)
features.barrier                         disable
features.barrier-timeout                 120
features.trash                           off (DEFAULT)
features.trash-dir                       .trashcan (DEFAULT)
features.trash-eliminate-path            (null) (DEFAULT)
features.trash-max-filesize              5MB (DEFAULT)
features.trash-internal-op               off (DEFAULT)
cluster.enable-shared-storage            disable
locks.trace                              off (DEFAULT)
locks.mandatory-locking                  off (DEFAULT)
cluster.disperse-self-heal-daemon        enable (DEFAULT)
cluster.quorum-reads                     no (DEFAULT)
client.bind-insecure                     (null) (DEFAULT)
features.timeout                         45 (DEFAULT)
features.failover-hosts                  (null) (DEFAULT)
features.shard                           off
features.shard-block-size                64MB (DEFAULT)
features.shard-lru-limit                 16384 (DEFAULT)
features.shard-deletion-rate             100 (DEFAULT)
features.scrub-throttle                  lazy
features.scrub-freq                      biweekly
features.scrub                           false (DEFAULT)
features.expiry-time                     120
features.signer-threads                  4
features.cache-invalidation              off
features.cache-invalidation-timeout      60 (DEFAULT)
ganesha.enable                           off
features.leases                          off
features.lease-lock-recall-timeout       60 (DEFAULT)
disperse.background-heals                8 (DEFAULT)
disperse.heal-wait-qlength               128 (DEFAULT)
cluster.heal-timeout                     600 (DEFAULT)
dht.force-readdirp                       on (DEFAULT)
disperse.read-policy                     gfid-hash (DEFAULT)
cluster.shd-max-threads                  1 (DEFAULT)
cluster.shd-wait-qlength                 1024 (DEFAULT)
cluster.locking-scheme                   full (DEFAULT)
cluster.granular-entry-heal              on
features.locks-revocation-secs           0 (DEFAULT)
features.locks-revocation-clear-all      false (DEFAULT)
features.locks-revocation-max-blocked    0 (DEFAULT)
features.locks-monkey-unlocking          false (DEFAULT)
features.locks-notify-contention         yes (DEFAULT)
features.locks-notify-contention-delay   5 (DEFAULT)
disperse.shd-max-threads                 1 (DEFAULT)
disperse.shd-wait-qlength                1024 (DEFAULT)
disperse.cpu-extensions                  auto (DEFAULT)
disperse.self-heal-window-size           32 (DEFAULT)
cluster.use-compound-fops                off
performance.parallel-readdir             off
performance.rda-request-size             131072
performance.rda-low-wmark                4096 (DEFAULT)
performance.rda-high-wmark               128KB (DEFAULT)
performance.rda-cache-limit              10MB
performance.nl-cache-positive-entry      false (DEFAULT)
performance.nl-cache-limit               10MB
performance.nl-cache-timeout             60 (DEFAULT)
cluster.brick-multiplex                  disable
cluster.brick-graceful-cleanup           disable
glusterd.vol_count_per_thread            100
cluster.max-bricks-per-process           250
disperse.optimistic-change-log           on (DEFAULT)
disperse.stripe-cache                    4 (DEFAULT)
cluster.halo-enabled                     False (DEFAULT)
cluster.halo-shd-max-latency             99999 (DEFAULT)
cluster.halo-nfsd-max-latency            5 (DEFAULT)
cluster.halo-max-latency                 5 (DEFAULT)
cluster.halo-max-replicas                99999 (DEFAULT)
cluster.halo-min-replicas                2 (DEFAULT)
features.selinux                         on
cluster.daemon-log-level                 INFO
debug.delay-gen                          off
delay-gen.delay-percentage               10% (DEFAULT)
delay-gen.delay-duration                 100000 (DEFAULT)
delay-gen.enable                          (DEFAULT)
disperse.parallel-writes                 off
disperse.quorum-count                    0 (DEFAULT)
features.sdfs                            off
features.cloudsync                       off
features.ctime                           on
ctime.noatime                            on
features.cloudsync-storetype             (null) (DEFAULT)
features.enforce-mandatory-lock          off
config.global-threading                  off
config.client-threads                    16
config.brick-threads                     16
features.cloudsync-remote-read           off
features.cloudsync-store-id              (null) (DEFAULT)
features.cloudsync-product-id            (null) (DEFAULT)
features.acl                             enable
feature.simple-quota-pass-through        true
feature.simple-quota.use-backend         false
cluster.use-anonymous-inode              yes
rebalance.ensure-durability              on (DEFAULT)

mount:

ganesha-nfs.gls.svc.cluster.local:/export/pvc-4bedf6cb-de30-4c89-8631-68b9c4112edd on /volume type nfs4 (rw,sync,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=30,retrans=2,sec=sys

ganesha config

NFS_CORE_PARAM {
      Enable_NLM = false;
      Enable_RQUOTA = false;
      Protocols = 4;
    }
    
    EXPORT
    {
      Export_Id = 1;
      Path = "/";
      Pseudo = "/export";
      Access_Type = RW;
      Squash = No_Root_Squash;
      SecType = "sys";
      FSAL {
        Name = "GLUSTER";
        Hostname = "glusterfs";
        Volume = "gv0";
        enable_upcall = true;
        Transport = "tcp";
      }
    }

pods command:

while true; do
  echo "Hello from $HOSTNAME, running on $NODENAME, started at $(date)" >> /volume/hello
  sleep 1
done

result file:

Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:10 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:10 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:11 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:11 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:12 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:12 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:13 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:13 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:14 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:14 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:15 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:15 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:16 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:16 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:17 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:17 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:18 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:18 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:19 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:19 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:20 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:21 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:22 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:23 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:24 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:25 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:25 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:26 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:27 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:28 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:29 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:30 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:31 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:31 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:32 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:33 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:34 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:35 UTC 2025

In addition I don't see such issues when multiple pods mount glusterfs directly.

Is this expected? Any preventive action for this?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions