-
Notifications
You must be signed in to change notification settings - Fork 750
micro optimisations #1290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
micro optimisations #1290
Conversation
you're definitely going to need to split this up and demonstrate actual value from each of the changes |
pycodestyle.py
Outdated
len_chunks = len(chunks) | ||
if ((len_chunks == 1 and multiline) or | ||
(len_chunks == 2 and chunks[0] == '#')) and \ | ||
length - len(chunks[-1]) < max_line_length - 7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduced calling len(chunks)
and len(line)
from two times to one
pycodestyle.py
Outdated
line_indents = expand_indent(line) | ||
if line.strip() and line_indents < ancestor_level: | ||
ancestor_level = line_indents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reduces calling expand_indent(line))
from two times to one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that now, we always call expand_indent
and previously we called it twice if and only if line.strip()
was not False. So we're reducing how often we calculate expand_indent
but we're not calculating it unconditionally, including for empty lines that do not need it calculated.
Any benchmark you perform here needs to show the impact across a set of lines that cover a significant number of cases, not just the singular, less than representative benchmark case you've already shown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, i will cover and benchmark each change with different cases eventually
pycodestyle.py
Outdated
'u', | ||
'U', | ||
] | ||
valid = python_3000_valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother aliasing at all? This doesn't seem worth the indirection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did it just to keep changes minimal as possible, but i think it will be okay to rename valid
to python_3000_valid
in the rest of the function
pycodestyle.py
Outdated
@@ -1576,7 +1580,8 @@ def ambiguous_identifier(logical_line, tokens): | |||
brace_depth = 0 | |||
idents_to_avoid = ('l', 'O', 'I') | |||
prev_type, prev_text, prev_start, prev_end, __ = tokens[0] | |||
for index in range(1, len(tokens)): | |||
len_tokens = len(tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❯ python -m timeit -s 'l = [0] * 10000' -- <<EOF
> if len(l):
> if len(l):
> pass
> EOF
20000000 loops, best of 5: 6.54 nsec per loop
And
❯ python -m timeit -s 'l = [0] * 10000' -- 'len(l)'
5000000 loops, best of 5: 20.9 nsec per loop
This is not even a microoptimization. len()
is not expensive not for the number of tokens we're looking at. Even if I make the list significantly larger:
❯ python -m timeit -s 'l = [0] * 1000000000' -- <<EOF
if len(l):
if len(l):
pass
EOF
50000000 loops, best of 5: 7.08 nsec per loop
It doesn't make that significant of a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree that difference is negligible, whole pr is about some tiny tiny micro optimisations. but i think it still worth to not have one function call inside a loop. more details here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that a micro
optimization might be tiny but still impactful. This is an increase in cognitive complexity without meaningful impact it seems.
tests/test_api.py
Outdated
@@ -337,7 +337,7 @@ def test_check_nullbytes(self): | |||
"stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes", # noqa: E501 | |||
"stdin:1:1: E901 TokenError: source code cannot contain null bytes", # noqa: E501 | |||
] | |||
self.assertEqual(stdout.splitlines(), expected) | |||
self.assertEqual(stdout.splitlines().sort(), expected.sort()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. I wonder if this fails without this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it fails... and i don't know why. but i think it's okay to have a different order of yields, they could even change in different versions of python, even if the code is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after checking out to the latest remote commit, i've found that is was not my changes that reordered yields. running tox
on platform linux -- Python 3.11.11[pypy-7.3.19-final], pytest-8.4.1, pluggy-1.6.0
fails with
self = <tests.test_api.APITestCase testMethod=test_check_nullbytes>
def test_check_nullbytes(self):
pycodestyle.register_check(DummyChecker, ['Z701'])
pep8style = pycodestyle.StyleGuide()
count_errors = pep8style.input_file('stdin', lines=['\x00\n'])
stdout = sys.stdout.getvalue()
if sys.version_info < (3, 11, 4): # pragma: <3.11 cover
expected = ["stdin:1:1: E901 ValueError: source code string cannot contain null bytes"] # noqa: E501
elif sys.version_info < (3, 12): # pragma: <3.12 cover # pragma: >=3.11 cover # noqa: E501
expected = ["stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes"] # noqa: E501
else: # pragma: >=3.12 cover
expected = [
"stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes", # noqa: E501
"stdin:1:1: E901 TokenError: source code cannot contain null bytes", # noqa: E501
]
> self.assertEqual(stdout.splitlines(), expected)
E AssertionError: Lists differ: ['stdin:1:1: E901 ValueError: source code string cannot contain null bytes'] != ['stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes']
E
E First differing element 0:
E 'stdin:1:1: E901 ValueError: source code string cannot contain null bytes'
E 'stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes'
E
E - ['stdin:1:1: E901 ValueError: source code string cannot contain null bytes']
E ? ^ ^^^
E
E + ['stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes']
E ? ^^^^ ^
tests/test_api.py:340: AssertionError
so i assume it's solely pypy`s responsibility
pycodestyle.py
Outdated
@@ -786,7 +788,7 @@ def whitespace_before_parameters(logical_line, tokens): | |||
E211: dict['key'] = list [index] | |||
""" | |||
prev_type, prev_text, __, prev_end, __ = tokens[0] | |||
for index in range(1, len(tokens)): | |||
for index, _ in enumerate(tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduces two function calls to one, plus this approach lets enumerate
generate indexes without calculating len of tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating len(tokens)
is not expensive. This is different in that we're still increasing the cost of what we do here.
Previously, we create a single range
object and iterate over it. Now, we create an iterator and we always return both thus causing an incref
for the _
binding above and eventually a decref
. So it increases cost in the GC logic. Besides, if we're iterating this way, why not avoid the next line (790) by instead doing
for index, (token_type, text, start, end, __) in enumerate(tokens):
This would be yet more efficient given you're focused on (thus far) not meaningful "optimizations".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right, i've also spotted this a few hours ago and will change this line on your suggested one
pycodestyle.py
Outdated
def is_string_literal(line): | ||
if line[0] in 'uUbB': | ||
line = line[1:] | ||
if line and line[0] in 'rR': | ||
line = line[1:] | ||
return line and (line[0] == '"' or line[0] == "'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this eliminates a tiny overhead of creating function inside each module_imports_on_top_of_file
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks import pycodestyle
as a library. This now exposes this as a public API interface which we do not want to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple fix is rename it to _is_string_literal
pycodestyle.py
Outdated
if prev_text in ('as', 'for', 'global', 'nonlocal') and \ | ||
text in idents_to_avoid: | ||
ident = text | ||
pos = start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every time i've combined two nested if
statements into one with and
, it was only just to reduce indentation. no performance impact, the bytecode (in 3.13) is the same
pycodestyle.py
Outdated
len_tokens = len(tokens) | ||
for index in range(1, len_tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reduces calculating len(tokens)
inside a loop, which is a small but unnecessary overhead. i didn't change range
to enumerate
due to we already calculated and will use len(tokens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that
for index in range(1, len(tokens)):
Re-calculates len(tokens)
on every iteration of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it'not, it is happening in line 1617
pycodestyle.py
Outdated
python_3000_valid = frozenset([ | ||
'\n', | ||
'\\', | ||
'\'', | ||
'"', | ||
'a', | ||
'b', | ||
'f', | ||
'n', | ||
'r', | ||
't', | ||
'v', | ||
'0', '1', '2', '3', '4', '5', '6', '7', | ||
'x', | ||
|
||
# Escape sequences only recognized in string literals | ||
'N', | ||
'u', | ||
'U', | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracting this from function and making this a set have two benefits: 1- we don't need to create a new list on each python_3000_invalid_escape_sequence
call. 2- lookups will become amortised constant instead of linear time, and this is very good since lookups are in a while
loop
pycodestyle.py
Outdated
lines_to_skip = SKIP_COMMENTS.union([tokenize.STRING]) | ||
skip_lines = False | ||
# Skip lines that | ||
for token_type, text, start, end, line in tokens: | ||
if token_type not in SKIP_COMMENTS.union([tokenize.STRING]): | ||
skip_lines.add(line) | ||
if token_type not in lines_to_skip: | ||
skip_lines = True | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, i did not fully understood what those line are meant for, but considering that skip_lines
will only be used only for checking are they empty or not, i assumed it is save to make it a bool and break from loop when it is set to True
pycodestyle.py
Outdated
lines_len = len(lines) | ||
for line_num, physical_line in enumerate(lines): | ||
if start[0] + line_num == 1 and line.startswith('#!'): | ||
return | ||
length = len(physical_line) | ||
chunks = physical_line.split() | ||
if token_type == tokenize.COMMENT: | ||
if (len(chunks) == 2 and | ||
length - len(chunks[-1]) < MAX_DOC_LENGTH): | ||
continue | ||
if len(chunks) == 1 and line_num + 1 < len(lines): | ||
if (len(chunks) == 1 and | ||
length - len(chunks[-1]) < MAX_DOC_LENGTH): | ||
continue | ||
len_chunks = len(chunks) | ||
len_last_chunk = len(chunks[-1]) if chunks else None | ||
if token_type == tokenize.COMMENT and \ | ||
(len_chunks == 2 and | ||
length - len_last_chunk < MAX_DOC_LENGTH): | ||
continue | ||
if len_chunks == 1 and line_num + 1 < lines_len and \ | ||
(len_chunks == 1 and | ||
length - len_last_chunk < MAX_DOC_LENGTH): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reduces unnecessary recalculation of length of those objects
pycodestyle.py
Outdated
elif not parens and token_type in NEWLINE: | ||
if token_type == tokenize.NEWLINE: | ||
self.check_logical() | ||
self.blank_before = 0 | ||
elif len(self.tokens) == 1: | ||
# The physical line contains only this token. | ||
self.blank_lines += 1 | ||
del self.tokens[0] | ||
else: | ||
self.check_logical() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only for indentation
tests/test_api.py
Outdated
@@ -337,7 +337,7 @@ def test_check_nullbytes(self): | |||
"stdin:1:1: E901 SyntaxError: source code string cannot contain null bytes", # noqa: E501 | |||
"stdin:1:1: E901 TokenError: source code cannot contain null bytes", # noqa: E501 | |||
] | |||
self.assertEqual(stdout.splitlines(), expected) | |||
self.assertEqual(stdout.splitlines().sort(), expected.sort()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is strange, but changing nested if
s to one with and
results in different order of yield
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this isn't a optimization any longer and it's a change in real behaviour that will be obvious to the uesr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it'not, i will revert changes that affect indentation
pycodestyle.py
Outdated
if prev_text == 'class' and \ | ||
text in idents_to_avoid: | ||
yield start, "E742 ambiguous class definition '%s'" % text | ||
if prev_text == 'def' and \ | ||
text in idents_to_avoid: | ||
yield start, "E743 ambiguous function definition '%s'" % text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
i'm currently working on benchmarking the affected code, but overall there is no significant performance gain import pyperf
import tokenize
from io import StringIO
from pycodestyle import maximum_doc_length
logical_line = "This is a sample docstring that exceeds the maximum length. This is a sample docstring that exceeds the maximum length."
max_doc_length = 72
noqa = False
tokens = list(tokenize.generate_tokens(StringIO(logical_line).readline))
def benchmark_maximum_doc_length():
list(maximum_doc_length(logical_line, max_doc_length, noqa, tokens))
runner = pyperf.Runner()
runner.timeit(
name="maximum_doc_length",
stmt="benchmark_maximum_doc_length()",
globals=globals()
) running this before changes resulted in |
Work in progress, albeit slow, I need to make sure all the changes are actually valuable. By the way, @asottile, could you please clarify what you mean by "split it up"? Split each function change into a separate PR, or create separate PRs for each "microoptimisation" technique? |
yeah generally each improvement should be done as a separate PR with some benchmarks showing it actually improves things |
Also, I want to express my deep gratitude to you all for creating and supporting this utilities. I believe and want these tools to live and develop and become only better and cooler, thank you again |
I forked Flake8 and hardcoded benchmarking utilities for
pycodestyle
. Below are performance comparisons before and after the changes.(Executed inside the Flake8 directory.)
Benchmark Results
Before Changes (3 Sample Runs)
After Changes (3 Sample Runs)
Analysis
!
in the results):missing_whitespace_after_keyword
(↓ ~15–20%)module_imports_on_top_of_file
(↓ ~20–25%)python_3000_invalid_escape_sequence
(↓ ~40–45%)So only the improvements in the three checks above (
missing_whitespace_after_keyword
,module_imports_on_top_of_file
, andpython_3000_invalid_escape_sequence
) are significant enough to justify applying the changes.