ENT-14145: Fixed GlobFind walks unrelated parents of literal path components#273
Conversation
|
There was an error running your pipeline, see logs for details. |
6be4366 to
43b5ecf
Compare
|
There was an error running your pipeline, see logs for details. |
|
@cf-bottom jenkins please? |
|
Sure, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/13875/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-13875/ |
larsewi
left a comment
There was a problem hiding this comment.
So stat can cause program to block on NFS. Didn't know that.
GJ 🚀 added some comments
|
There was an error running your pipeline, see logs for details. |
d9aa916 to
814541c
Compare
|
There was an error running your pipeline, see logs for details. |
814541c to
5c51573
Compare
|
There was an error running your pipeline, see logs for details. |
5c51573 to
85ed69e
Compare
|
There was an error running your pipeline, see logs for details. |
85ed69e to
c924085
Compare
|
There was an error running your pipeline, see logs for details. |
|
@cf-bottom jenkins please? |
|
Sure, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/13883/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-13883/ |
GlobFind() handed every absolute pattern to PathWalk("/", ...), which calls
ListDir() + stat() on every entry of each parent directory on the way down.
For a pattern like /var/cfengine/state/diff/*.diff this enumerates and stats
every entry of /, every entry of /var, and every entry of /var/cfengine
before reaching the target directory.
When any top-level entry under / is on a stale NFS mount, the kernel-side
getattr RPC blocks indefinitely in rpc_wait_bit_killable. cf-promises (and
any binary that loads policy via libpromises) enters uninterruptible D state
and cannot be killed until the mount is force-unmounted or the server
returns. cf-execd then spawns a new cf-promises on every cycle, each of
which also wedges, producing the process pile-up reported in ENT-14146.
Peel literal components (no glob metacharacters) off the front of the
component sequence and start PathWalk at the deepest literal directory.
For /var/cfengine/state/diff/*.diff the walk becomes stat("/var") ->
stat("/var/cfengine") -> stat("/var/cfengine/state") ->
stat("/var/cfengine/state/diff") -> opendir("/var/cfengine/state/diff") +
getdents + match *.diff, and never touches unrelated top-level entries.
Ticket: ENT-14146
Changelog: Title
c924085 to
6e22f77
Compare
|
There was an error running your pipeline, see logs for details. |
GlobFind()handed every absolute pattern toPathWalk("/", ...), which enumerated andstat'd every entry of each parent directory on the way down — even when the path components were literal with no glob metacharacters. Relative patterns did the same thing rooted at..For a pattern like
/var/cfengine/state/diff/*.diff, you'd getopendir("/") + getdents + staton every top-level entry of/before the walk ever reached/var. When one of those top-level entries is a stale NFS mount,newfstatatblocks indefinitely inrpc_wait_bit_killableand the whole process wedges in uninterruptibleDstate.cf-execdthen spawns a newcf-promisesevery cycle, each of which also wedges — the pile-up the customer reported on ENT-14146.To fix:
Peel literal components (no
*,?,[,]) off the front of the component sequence and startPathWalk()at the deepest literal directory. For/var/cfengine/state/diff/*.diffthe walk becomes:It never touches
/dev,/proc, or any unrelated top-level entry. Same logic now applies to relative patterns —a/b/c/*.txtopens only./a/b/c, never./a/dor./a/sibling.txt.The peel lives in
GlobFind()(notPathWalk()) because that's where the component sequence is known.PathWalk()stays general-purpose.Not covered:
\*,\?).IsGlobLiteralis conservative — sees the*, returns false, falls back to the existingPathWalk + GlobMatchflow which does honor the escape.\\host\...) and disk paths (C:\...) are unchanged.Ticket: ENT-14146