-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Description
Hello, created 3 pods that doing writes to same file via nfs-ganesha fsal to gluster cluster.
Writes finishing with no errors, but when inspecting the file lot of writes are missing seems like in some random pattern.
glusterfs: 11.1
nfs-ganesha: 4.3
volume options:
Option Value
------ -----
cluster.lookup-unhashed on (DEFAULT)
cluster.lookup-optimize on (DEFAULT)
cluster.rmdir-optimize on (DEFAULT)
cluster.min-free-disk 10% (DEFAULT)
cluster.min-free-inodes 5% (DEFAULT)
cluster.rebalance-stats off (DEFAULT)
cluster.subvols-per-directory (null) (DEFAULT)
cluster.readdir-optimize off (DEFAULT)
cluster.rsync-hash-regex (null) (DEFAULT)
cluster.extra-hash-regex (null) (DEFAULT)
cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT)
cluster.randomize-hash-range-by-gfid off (DEFAULT)
cluster.rebal-throttle normal (DEFAULT)
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null) (DEFAULT)
cluster.weighted-rebalance on (DEFAULT)
cluster.switch-pattern (null) (DEFAULT)
cluster.entry-change-log on (DEFAULT)
cluster.read-subvolume (null) (DEFAULT)
cluster.read-subvolume-index -1 (DEFAULT)
cluster.read-hash-mode 1 (DEFAULT)
cluster.background-self-heal-count 8 (DEFAULT)
cluster.metadata-self-heal off (DEFAULT)
cluster.data-self-heal off (DEFAULT)
cluster.entry-self-heal off (DEFAULT)
cluster.self-heal-daemon on (DEFAULT)
cluster.heal-timeout 600 (DEFAULT)
cluster.self-heal-window-size 8 (DEFAULT)
cluster.data-change-log on (DEFAULT)
cluster.metadata-change-log on (DEFAULT)
cluster.data-self-heal-algorithm (null) (DEFAULT)
cluster.eager-lock on (DEFAULT)
disperse.eager-lock on (DEFAULT)
disperse.other-eager-lock on (DEFAULT)
disperse.eager-lock-timeout 1 (DEFAULT)
disperse.other-eager-lock-timeout 1 (DEFAULT)
cluster.quorum-type auto
cluster.quorum-count (null) (DEFAULT)
cluster.choose-local true (DEFAULT)
cluster.self-heal-readdir-size 1KB (DEFAULT)
cluster.post-op-delay-secs 1 (DEFAULT)
cluster.ensure-durability on (DEFAULT)
cluster.consistent-metadata no (DEFAULT)
cluster.heal-wait-queue-length 128 (DEFAULT)
cluster.favorite-child-policy none (DEFAULT)
cluster.full-lock yes (DEFAULT)
cluster.optimistic-change-log on (DEFAULT)
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off (DEFAULT)
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL (DEFAULT)
diagnostics.client-sys-log-level CRITICAL (DEFAULT)
diagnostics.brick-logger (null) (DEFAULT)
diagnostics.client-logger (null) (DEFAULT)
diagnostics.brick-log-format (null) (DEFAULT)
diagnostics.client-log-format (null) (DEFAULT)
diagnostics.brick-log-buf-size 5 (DEFAULT)
diagnostics.client-log-buf-size 5 (DEFAULT)
diagnostics.brick-log-flush-timeout 120 (DEFAULT)
diagnostics.client-log-flush-timeout 120 (DEFAULT)
diagnostics.stats-dump-interval 0 (DEFAULT)
diagnostics.fop-sample-interval 0 (DEFAULT)
diagnostics.stats-dump-format json (DEFAULT)
diagnostics.fop-sample-buf-size 65535 (DEFAULT)
diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT)
performance.cache-max-file-size 0 (DEFAULT)
performance.cache-min-file-size 0 (DEFAULT)
performance.cache-refresh-timeout 1 (DEFAULT)
performance.cache-priority (DEFAULT)
performance.io-cache-size 32MB (DEFAULT)
performance.cache-size 32MB (DEFAULT)
performance.io-thread-count 16 (DEFAULT)
performance.high-prio-threads 16 (DEFAULT)
performance.normal-prio-threads 16 (DEFAULT)
performance.low-prio-threads 16 (DEFAULT)
performance.least-prio-threads 1 (DEFAULT)
performance.enable-least-priority on (DEFAULT)
performance.iot-watchdog-secs (null) (DEFAULT)
performance.iot-cleanup-disconnected-reqs off (DEFAULT)
performance.iot-pass-through false (DEFAULT)
performance.io-cache-pass-through false (DEFAULT)
performance.quick-read-cache-size 128MB (DEFAULT)
performance.cache-size 128MB (DEFAULT)
performance.quick-read-cache-timeout 1 (DEFAULT)
performance.qr-cache-timeout 1 (DEFAULT)
performance.quick-read-cache-invalidation false (DEFAULT)
performance.ctime-invalidation false (DEFAULT)
performance.flush-behind on (DEFAULT)
performance.nfs.flush-behind off
performance.write-behind-window-size 1MB (DEFAULT)
performance.resync-failed-syncs-after-fsync off (DEFAULT)
performance.nfs.write-behind-window-size 1MB (DEFAULT)
performance.strict-o-direct off (DEFAULT)
performance.nfs.strict-o-direct on
performance.strict-write-ordering off
performance.nfs.strict-write-ordering on
performance.write-behind-trickling-writes off
performance.aggregate-size 128KB (DEFAULT)
performance.nfs.write-behind-trickling-writes off
performance.lazy-open off
performance.read-after-open yes (DEFAULT)
performance.open-behind-pass-through false (DEFAULT)
performance.read-ahead-page-count 4 (DEFAULT)
performance.read-ahead-pass-through false (DEFAULT)
performance.readdir-ahead-pass-through false (DEFAULT)
performance.md-cache-pass-through false (DEFAULT)
performance.write-behind-pass-through false (DEFAULT)
performance.md-cache-timeout 1 (DEFAULT)
performance.cache-swift-metadata false (DEFAULT)
performance.cache-samba-metadata false (DEFAULT)
performance.cache-capability-xattrs true (DEFAULT)
performance.cache-ima-xattrs true (DEFAULT)
performance.md-cache-statfs off (DEFAULT)
performance.xattr-cache-list (DEFAULT)
performance.nl-cache-pass-through false (DEFAULT)
network.frame-timeout 1800 (DEFAULT)
network.ping-timeout 42 (DEFAULT)
network.tcp-window-size (null) (DEFAULT)
client.ssl off
network.remote-dio disable (DEFAULT)
client.event-threads 2 (DEFAULT)
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
client.strict-locks off
network.tcp-window-size (null) (DEFAULT)
network.inode-lru-limit 16384 (DEFAULT)
auth.allow *
auth.reject (null) (DEFAULT)
transport.keepalive 1
server.allow-insecure on (DEFAULT)
server.root-squash off (DEFAULT)
server.all-squash off (DEFAULT)
server.anonuid 65534 (DEFAULT)
server.anongid 65534 (DEFAULT)
server.statedump-path /var/run/gluster (DEFAULT)
server.outstanding-rpc-limit 64 (DEFAULT)
server.ssl off
auth.ssl-allow *
server.manage-gids off (DEFAULT)
server.dynamic-auth on (DEFAULT)
client.send-gids on (DEFAULT)
server.gid-timeout 300 (DEFAULT)
server.own-thread (null) (DEFAULT)
server.event-threads 2 (DEFAULT)
server.tcp-user-timeout 42 (DEFAULT)
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
ssl.own-cert (null) (DEFAULT)
ssl.private-key (null) (DEFAULT)
ssl.ca-list (null) (DEFAULT)
ssl.crl-path (null) (DEFAULT)
ssl.certificate-depth (null) (DEFAULT)
ssl.cipher-list (null) (DEFAULT)
ssl.dh-param (null) (DEFAULT)
ssl.ec-curve (null) (DEFAULT)
transport.address-family inet
performance.write-behind off
performance.read-ahead off
performance.readdir-ahead off
performance.io-cache off
performance.open-behind off
performance.quick-read off
performance.nl-cache off
performance.stat-prefetch off
performance.client-io-threads off
performance.nfs.write-behind off
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true (DEFAULT)
performance.cache-invalidation false (DEFAULT)
performance.global-cache-invalidation true (DEFAULT)
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15 (DEFAULT)
network.compression.mem-level 8 (DEFAULT)
network.compression.min-size 1024 (DEFAULT)
network.compression.compression-level 1 (DEFAULT)
network.compression.debug false (DEFAULT)
features.default-soft-limit 80% (DEFAULT)
features.soft-timeout 60 (DEFAULT)
features.hard-timeout 5 (DEFAULT)
features.alert-time 86400 (DEFAULT)
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no (DEFAULT)
debug.log-file no (DEFAULT)
debug.exclude-ops (null) (DEFAULT)
debug.include-ops (null) (DEFAULT)
debug.error-gen off
debug.error-failure (null) (DEFAULT)
debug.error-number (null) (DEFAULT)
debug.random-failure off (DEFAULT)
debug.error-fops (null) (DEFAULT)
features.read-only off (DEFAULT)
features.worm off
features.worm-file-level off
features.worm-files-deletable on
features.default-retention-period 120 (DEFAULT)
features.retention-mode relax (DEFAULT)
features.auto-commit-period 180 (DEFAULT)
storage.linux-aio off (DEFAULT)
storage.linux-io_uring off (DEFAULT)
storage.batch-fsync-mode reverse-fsync (DEFAULT)
storage.batch-fsync-delay-usec 0 (DEFAULT)
storage.owner-uid -1 (DEFAULT)
storage.owner-gid -1 (DEFAULT)
storage.node-uuid-pathinfo off (DEFAULT)
storage.health-check-interval 30 (DEFAULT)
storage.build-pgfid off (DEFAULT)
storage.gfid2path on (DEFAULT)
storage.gfid2path-separator : (DEFAULT)
storage.reserve 1 (DEFAULT)
storage.health-check-timeout 20 (DEFAULT)
storage.fips-mode-rchecksum on
storage.force-create-mode 0000 (DEFAULT)
storage.force-directory-mode 0000 (DEFAULT)
storage.create-mask 0777 (DEFAULT)
storage.create-directory-mask 0777 (DEFAULT)
storage.max-hardlinks 100 (DEFAULT)
features.ctime on (DEFAULT)
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 51
changelog.changelog off (DEFAULT)
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs (DEFAULT)
changelog.encoding ascii (DEFAULT)
changelog.rollover-time 15 (DEFAULT)
changelog.fsync-interval 5 (DEFAULT)
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off (DEFAULT)
features.barrier disable
features.barrier-timeout 120
features.trash off (DEFAULT)
features.trash-dir .trashcan (DEFAULT)
features.trash-eliminate-path (null) (DEFAULT)
features.trash-max-filesize 5MB (DEFAULT)
features.trash-internal-op off (DEFAULT)
cluster.enable-shared-storage disable
locks.trace off (DEFAULT)
locks.mandatory-locking off (DEFAULT)
cluster.disperse-self-heal-daemon enable (DEFAULT)
cluster.quorum-reads no (DEFAULT)
client.bind-insecure (null) (DEFAULT)
features.timeout 45 (DEFAULT)
features.failover-hosts (null) (DEFAULT)
features.shard off
features.shard-block-size 64MB (DEFAULT)
features.shard-lru-limit 16384 (DEFAULT)
features.shard-deletion-rate 100 (DEFAULT)
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false (DEFAULT)
features.expiry-time 120
features.signer-threads 4
features.cache-invalidation off
features.cache-invalidation-timeout 60 (DEFAULT)
ganesha.enable off
features.leases off
features.lease-lock-recall-timeout 60 (DEFAULT)
disperse.background-heals 8 (DEFAULT)
disperse.heal-wait-qlength 128 (DEFAULT)
cluster.heal-timeout 600 (DEFAULT)
dht.force-readdirp on (DEFAULT)
disperse.read-policy gfid-hash (DEFAULT)
cluster.shd-max-threads 1 (DEFAULT)
cluster.shd-wait-qlength 1024 (DEFAULT)
cluster.locking-scheme full (DEFAULT)
cluster.granular-entry-heal on
features.locks-revocation-secs 0 (DEFAULT)
features.locks-revocation-clear-all false (DEFAULT)
features.locks-revocation-max-blocked 0 (DEFAULT)
features.locks-monkey-unlocking false (DEFAULT)
features.locks-notify-contention yes (DEFAULT)
features.locks-notify-contention-delay 5 (DEFAULT)
disperse.shd-max-threads 1 (DEFAULT)
disperse.shd-wait-qlength 1024 (DEFAULT)
disperse.cpu-extensions auto (DEFAULT)
disperse.self-heal-window-size 32 (DEFAULT)
cluster.use-compound-fops off
performance.parallel-readdir off
performance.rda-request-size 131072
performance.rda-low-wmark 4096 (DEFAULT)
performance.rda-high-wmark 128KB (DEFAULT)
performance.rda-cache-limit 10MB
performance.nl-cache-positive-entry false (DEFAULT)
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 60 (DEFAULT)
cluster.brick-multiplex disable
cluster.brick-graceful-cleanup disable
glusterd.vol_count_per_thread 100
cluster.max-bricks-per-process 250
disperse.optimistic-change-log on (DEFAULT)
disperse.stripe-cache 4 (DEFAULT)
cluster.halo-enabled False (DEFAULT)
cluster.halo-shd-max-latency 99999 (DEFAULT)
cluster.halo-nfsd-max-latency 5 (DEFAULT)
cluster.halo-max-latency 5 (DEFAULT)
cluster.halo-max-replicas 99999 (DEFAULT)
cluster.halo-min-replicas 2 (DEFAULT)
features.selinux on
cluster.daemon-log-level INFO
debug.delay-gen off
delay-gen.delay-percentage 10% (DEFAULT)
delay-gen.delay-duration 100000 (DEFAULT)
delay-gen.enable (DEFAULT)
disperse.parallel-writes off
disperse.quorum-count 0 (DEFAULT)
features.sdfs off
features.cloudsync off
features.ctime on
ctime.noatime on
features.cloudsync-storetype (null) (DEFAULT)
features.enforce-mandatory-lock off
config.global-threading off
config.client-threads 16
config.brick-threads 16
features.cloudsync-remote-read off
features.cloudsync-store-id (null) (DEFAULT)
features.cloudsync-product-id (null) (DEFAULT)
features.acl enable
feature.simple-quota-pass-through true
feature.simple-quota.use-backend false
cluster.use-anonymous-inode yes
rebalance.ensure-durability on (DEFAULT)
mount:
ganesha-nfs.gls.svc.cluster.local:/export/pvc-4bedf6cb-de30-4c89-8631-68b9c4112edd on /volume type nfs4 (rw,sync,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=30,retrans=2,sec=sys
ganesha config
NFS_CORE_PARAM {
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
}
EXPORT
{
Export_Id = 1;
Path = "/";
Pseudo = "/export";
Access_Type = RW;
Squash = No_Root_Squash;
SecType = "sys";
FSAL {
Name = "GLUSTER";
Hostname = "glusterfs";
Volume = "gv0";
enable_upcall = true;
Transport = "tcp";
}
}
pods command:
while true; do
echo "Hello from $HOSTNAME, running on $NODENAME, started at $(date)" >> /volume/hello
sleep 1
done
result file:
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:10 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:10 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:11 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:11 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:12 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:12 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:13 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:13 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:14 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:14 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:15 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:15 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:16 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:16 UTC 2025
Hello from volume-logger-769cd45975-jjtwc, running on node3, started at Thu Jul 31 22:53:17 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:17 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:18 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:18 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:19 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:19 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:20 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:21 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:22 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:23 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:24 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:25 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:25 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:26 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:27 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:28 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:29 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:30 UTC 2025
Hello from volume-logger-769cd45975-ptnqs, running on node1, started at Thu Jul 31 22:53:31 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:31 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:32 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:33 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:34 UTC 2025
Hello from volume-logger-769cd45975-czgr6, running on node2, started at Thu Jul 31 22:53:35 UTC 2025
In addition I don't see such issues when multiple pods mount glusterfs directly.
Is this expected? Any preventive action for this?
Metadata
Metadata
Assignees
Labels
No labels