Conversation
tests/test.py
Outdated
| def convert(self, line: str): | ||
| return json.loads(line) | ||
|
|
||
| def parse_fatrace_text_line(line): |
There was a problem hiding this comment.
FTR, I'm not a fan of this. This logic is way too complex for a test -- you would now need a whole unit test for this function.
|
I agree that as a general rule, complex logic isn't desirable in tests. In
this case however, the context makes it possible to argue in favor of it,
and it goes like this:
`parse_fatrace_text_line` is already tested since it's used in every test
case except `test_json`. In fact, it's probably the most well tested code
in this code base. These tests are enough to make it robust; if I wanted to
introduce a serious bug in the test harness, e.g. one that falsely reports
all tests as successes, I couldn't do so in the convert functions, I'd have
to do so in e.g. the assert methods. This is because the convert function
doesn't actually perform any tests of values, it has to return a value
which is then tested by much simpler logic. Introduce a bug in this
function and you're very likely to see a failure, much more so than if you
introduced a bug in fatrace itself.
…On Wed, Aug 6, 2025, 06:07 Martin Pitt ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/test.py
<#66 (comment)>:
> "---- Log content ----\n"
f"{self.log_content}\n"
"-----------------")
+class FatraceRunnerText(FatraceRunnerAbstract):
+ def convert(self, line: str):
+ return parse_fatrace_text_line(line)
+
+class FatraceRunnerJson(FatraceRunnerAbstract):
+ def convert(self, line: str):
+ return json.loads(line)
+
+def parse_fatrace_text_line(line):
FTR, I'm not a fan of this. This logic is way too complex for a test --
you would now need a whole unit test for *this* function.
—
Reply to this email directly, view it on GitHub
<#66 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABIAEW5ZXZRWPBBGGIHHAD3MF5PVAVCNFSM6AAAAACDGK7SMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTAOJQGM2DSMJRGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Yes, maybe, but
Sorry, I don't want to take this. |
FTR, you can use
FTR, this PR is 161 insertions(+), 179 deletions(-). The stats shown on GitHub includes #60 since it's not merged yet.
That's fair, feel free to close it. |
Ah right -- sorry, I didn't read it carefully enough.
Oops, sorry! I just took a quick look so far, I'm stuck in $daytime job work mostly. |
Turns out I'm the hasty one here. What I said should be true after the recent push, and as an example I restored |
4cfc374 to
72efee6
Compare
|
@martinpitt Rebased and polished a bit. Ultimately this is a matter of taste, so let me know if this is a go or not. If it's a go, I might fix type annotations a bit. |
martinpitt
left a comment
There was a problem hiding this comment.
I re-read this, and sorry, I still don't like it. It's completely opposite of how I want a test to look like.
The bit that I do like is the the consistent parallel running of text and json, though. I.e. I would actually appreciate writing
self.assert_log(
lambda e: e["comm"] == "rm" and e["path"] == cwd and e["types"] == "D"),
rf"^rm.*:\s+D\s+{cwd_re}$"
as that forces doing consistent checks between the two modes, yet keeps the direct readable REs how the output is expected to look like. WDYT?
tests/test.py
Outdated
| class FatraceRunner: | ||
| def __init__(self, args: list[str]): | ||
| class FatraceRunnerBase: | ||
| def __init__(self, args: list[str], convert_line = lambda x: x, convert_condition = lambda x: x): |
There was a problem hiding this comment.
the conversion functions here and condition arguments below are missing types. Probably best to add a ConvertLineType constant etc. at the top, with some comment/example.
There was a problem hiding this comment.
This will go away when the "lambda gymnastics" is removed.
tests/test.py
Outdated
| def has_log(self, condition) -> bool: | ||
| """Check if any line matches the condition.""" | ||
|
|
||
| if not self.finished: |
There was a problem hiding this comment.
meh magic side effect, I like the explicit .finish() call better. But it's not a strong "meh".
tests/test.py
Outdated
|
|
||
| def FatraceRunnerText(*args: str): | ||
| assert "--json" not in args | ||
| return FatraceRunnerBase([*args], convert_condition = lambda regex: lambda line: bool(re.search(regex, line))) |
There was a problem hiding this comment.
How about just making self.convert_condition and _line actual methods, instead of this lambda gymnastics? The base function can throw a NotImplementedError for them. Then the parsing code can also move there.
| test_file_str = str(test_file) | ||
|
|
||
| # file creation | ||
| f.assert_log(rf"^touch.*\sC?W?O\s+{re.escape(test_file_str)}") |
There was a problem hiding this comment.
I still hate this, really. It's now not clear at all any more how the expected text output is supposed to look like. I often write the test first, as that's a good way to design how it should look like. And then implement the functionality with a good model in my mind. This is completely upside down, and adjusting the big generic parsing function is the opposite of what a test should be -- simple, obvious, orthogonal, robust.
That's fine. I think this is ultimately a matter of taste, and your gοod taste is what made
I think it's a fine idea. In your taste, REs are more readable than the lambdas, while in my taste they represent unnecessary repetition of ad-hoc parsing logic. Let's go with your taste. How about:
|
|
Hello @svenssonaxel !
Note that I'm not ranking REs/lambdas for readability (FWIW, JSON is easier to validate, I fully agree). My point is, the text output is a thing, and tests should directly reflect how it looks like, also to design new features.
✅
That feels a bit too much effort for me, and it will still require that complicated/hard to maintain translator. But if you really like this, I'll take it 😬 Thanks! |
72efee6 to
bc134d8
Compare
|
@martinpitt ok, finally got around to it. I made a significant effort to avoid the parsing logic, but to no avail. What I tried was to run two instances of fatrace in parallel and match up the printed events. Turns out that the kernel can report events merged in different ways to different listeners, so this line of effort turned out to be at least as complex as the parsing. If it makes you feel better, the parsing did pay off. Using that check, I've found and fixed several critical regex bugs. If you use the github runner on |
| - name: Run tests | ||
| run: sudo python3 -m unittest -v |
There was a problem hiding this comment.
Running linters before test may be a temporary convenience for this, but we shouldn't land this. Clearly functionality is more important than code style? I.e. if a MR fails, that's the more important piece of information than "your line is too long".
There was a problem hiding this comment.
ok, will remove.
| class Device(TypedDict): | ||
| major: int | ||
| minor: int | ||
| class Parent(TypedDict, total=False): |
There was a problem hiding this comment.
Hmm... Why do we have ruff when it doesn't complain about missing newlines between class definitions? I want my money back! 😉
| self.log_dir: tempfile.TemporaryDirectory[str] = tempfile.TemporaryDirectory() | ||
| self.text_output_file: str = os.path.join(self.log_dir.name, "fatrace.log.txt") | ||
| self.json_output_file: str = os.path.join(self.log_dir.name, "fatrace.log.json") | ||
| self.finished: bool = False |
There was a problem hiding this comment.
OOI: These types are rather obvious, and mypy should be perfectly able to figure these out? Did you run into type errors, or is that more of a personal preference? (No big objection from me, just a bit unnecessary clutter)
There was a problem hiding this comment.
I don't think I got any type errors. I'm happy removing type hints given a rule for which ones.
tests/test.py
Outdated
| # start processes | ||
| self.text_process: subprocess.Popen[bytes] = subprocess.Popen( | ||
| [fatrace_bin, "-o", str(self.text_output_file)] + | ||
| [x for x in args if x!="--json"]) |
There was a problem hiding this comment.
Instead of asserting it and filtering it our here, wouldn't it be easier to not repeat this in the caller, and prepending/appending it in json_process below?
| f.assert_log(lambda e: e["comm"] == "touch" and e["path"] == test_file_str and "O" in e["types"], | ||
| rf"^touch.*\sC?W?O\s+{re.escape(test_file_str)}") |
There was a problem hiding this comment.
FTR, I like these assertions. They are explicit in how both outputs are expected to look like. 👍
| # For each text mode output line, assert that the regex matches it iff | ||
| # the predicate matches it after parsing |
There was a problem hiding this comment.
Can we please drop this, together with parse_fatrace_text_line()? The assertion structure already enforces checking text and JSON output. Let's assume that us developers are not malicious against ourselves.
There was a problem hiding this comment.
If you haven't already, take a look at 60bf4ac Fix critical bugs in testing regexes. No malice was required to produce these bugs. They are critical bugs, in the sense that the buggy regexes do not test what they intend to test.
For example, rf"^rm.*:\s+D\s+{cwd_re}$") would result in a passing test even if fatrace didn't output anything at all from any rm command. These bugs were only caught using the assertions you now want to remove; neither of us detected them through manual inspection.
I agree that the parsing code is on the brittle end of the scale. It's also not strictly necessary. My judgment is that it's still a net positive and should remain, until it's no longer a net positive. If a time comes when adjusting or growing the parsing code is more than you can bear, it can then be discarded, safely except for the increased risk of such regex bugs. Removing it now would increase this risk earlier than necessary, and I don't see the point of doing so. WDYT?
tests/test.py
Outdated
| def _exe(argv_any: tuple[Any, ...], **kwargs) -> None: | ||
| argv: list[str] = [str(x) for x in argv_any] |
There was a problem hiding this comment.
I'm allergic to Any. In cases where a function is really just passing on some opaque value to the stdlib, Unknown is better and more explicit. However, that's not what happens here: A process argv must always be a str list. What blows up when declaring it that way?
I see below you are trying to get rid of str()ing pathlib.Path objecs. For quality-of-live it could support that explicitly (Sequence[str|Path]).
tests/test.py
Outdated
| def _exe(argv_any: tuple[Any, ...], **kwargs) -> None: | ||
| argv: list[str] = [str(x) for x in argv_any] | ||
| print(f"Running command: {' '.join(argv)}", file=_exe_stream) | ||
| proc = subprocess.run(argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False, **kwargs) |
There was a problem hiding this comment.
This is unsafe as a generic API. If a process outputs too much stuff on stdout and nobody reads the buffers, it'll block. The previous code used stdout=NULL for that reason.
Merge FatraceRunner* classes into one. Let class FatraceRunner run two instances of fatrace in parallel, one with and one without --json. Each assert provides a regex for testing the text mode log and a lambda for testing the JSON mode log. The regex and lambda should match the same sets of events. However, we can't rely on the assumption that the two fatrace instances catch the exact same sequence of events. Instead, for each assert and text mode output line, we assert that the regex matches the line iff the lambda matches it after parsing.
These bugs are actual false positives, i.e. they fix regexes that unintentionally matched an event actually present in that test case, that the lambda did not match. - Three instances of searching for `rm` but also matching `rmdir`. Fixed by adding `\(`. - One instance of matching an unintended sub path, fixed by adding `$`. - One instance of matching any path, fixed by adding the intended path.
737af3d to
b02afc5
Compare
… text and json mode")
|
@martinpitt ping |
Uh oh!
There was an error while loading. Please reload this page.