-
Notifications
You must be signed in to change notification settings - Fork 840
Initialize AuTest port queue concurrently #12291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
On WSL, this change reduces the runtime for the AuTest setup from prohibitive (hours) to annoying (< 1 minute). Most developers are not using WSL, and could possibly see a performance degredation with this change. This needs testing to see how it performs in typical dev environments before it can be considered for merging.
host.WriteDebug( | ||
'PortOpen', f"Connection to port {port} succeeded, the port is open, " | ||
"and a future connection cannot use it") | ||
except ConnectionRefusedError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this locally, in a Rancher docker vm on my Apple silicon macbook pro, I get the following exception:
╰─➤ ./autest.sh --sandbox /tmp/sb --clean=none -f emergency 20:07:30 [4/186]
Running Test emergency:E
Generating Report: --------------
Test: emergency: Exception
File: emergency.test.py
Directory: /home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown
Reason: Traceback (most recent call last):
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 246, in _setup_port_queue
loop = asyncio.get_running_loop()
RuntimeError: no running event loop
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/runtesttask.py", line 34, in __call__
tl = self.__logic.Run(self.__test)
File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/runlogic.py", line 18, in Run
if not tmp.Start(obj):
~~~~~~~~~^^^^^
File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/test.py", line 90, in Start
loadTest(self.__test)
~~~~~~~~^^^^^^^^^^^^^
File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/test.py", line 292, in loadTest
execFile(fileName, locals, locals)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/common/execfile.py", line 16, in execFile
exec(safeCompile(f.read(), fname), globals, locals)
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown/emergency.test.py", line 25, in <module>
ts = Test.MakeATSProcess('ts')
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/trafficserver.test.ext", line 348, in MakeATSProcess
get_port(p, "port")
~~~~~~~~^^^^^^^^^^^
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 282, in get_port
_setup_port_queue()
~~~~~~~~~~~~~~~~~^^
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 249, in _setup_port_queue
asyncio.run(async_setup())
~~~~~~~~~~~^^^^^^^^^^^^^^^
File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run
return runner.run(main)
~~~~~~~~~~^^^^^^
File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/usr/lib64/python3.13/asyncio/base_events.py", line 719, in run_until_complete
return future.result()
~~~~~~~~~~~~~^^
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 243, in async_setup
await g_ports.select_available(amount, dmin, dmax)
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 60, in select_available
await asyncio.gather(*port_tasks)
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 63, in _check_port
if await self._is_port_open(port):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 81, in _is_port_open
reader, writer = await asyncio.open_connection(address, port, limit=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.13/asyncio/streams.py", line 48, in open_connection
transport, _ = await loop.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lambda: protocol, host, port, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.13/asyncio/base_events.py", line 1168, in create_connection
raise OSError('Multiple exceptions: {}'.format(
', '.join(str(exc) for exc in exceptions)))
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 61001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 61001)
Total of 1 test
Unknown: 0
Exception: 1
Failed: 0
Warning: 0
Skipped: 0
Passed: 0
It seems like we need to add OSError to the list of handles exceptions:
diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py
index 0b8ec00e5..c5259634c 100644
--- a/tests/gold_tests/autest-site/ports.py
+++ b/tests/gold_tests/autest-site/ports.py
@@ -85,7 +85,7 @@ class AsyncPortQueue(OrderedSetQueue):
host.WriteDebug(
'PortOpen', f"Connection to port {port} succeeded, the port is open, "
"and a future connection cannot use it")
- except ConnectionRefusedError:
+ except (ConnectionRefusedError, OSError):
host.WriteDebug(
'PortOpen', f"socket error for port {port}, port is closed, "
"and therefore a future connection can use it")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running locally:
╰─➤ time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed
Generating Report: --------------
Total of 1 test
Unknown: 0
Exception: 0
Failed: 0
Warning: 0
Skipped: 0
Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency 0.41s user 0.26s system 82% cpu 0.819 total
Checking out your patch (and applying the patch I mentioned to handle OSError):
╰─➤ time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed
Generating Report: --------------
Total of 1 test
Unknown: 0
Exception: 0
Failed: 0
Warning: 0
Skipped: 0
Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency 8.06s user 6.60s system 144% cpu 10.154 total
So user time goes from about .5 seconds to 8 seconds with this change for some reason. I can try to investigate why...
rmax = 65536 - dmax | ||
|
||
port_tasks = [] | ||
await asyncio.gather(*port_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this await here on an empty port_tasks intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
if rmax > amount: | ||
# Fill in ports, starting above the upper OS-usable port range. | ||
port = dmax + 1 | ||
while port < 65536 and self.qsize() < amount: | ||
port_tasks.append(self._check_port(port)) | ||
port += 1 | ||
if rmin > amount and self.qsize() < amount: | ||
port = 2001 | ||
# Fill in more ports, starting at 2001, well above well known ports, | ||
# and going up until the minimum port range used by the OS. | ||
while port < dmin and self.qsize() < amount: | ||
port_tasks.append(self._check_port(port)) | ||
port += 1 | ||
|
||
await asyncio.gather(*port_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we're doing more port checks than originally here. On my system, this adds about 35k or so port_tasks values here, each of which are done even though we only need 1k. Originally, we we essentially short circuit the loop once self.qsize()
reaches amount
(1000).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of this? It iteratively adds tasks until we get the desired amount of free ports:
diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py
index c5259634c..951bcc3fe 100644
--- a/tests/gold_tests/autest-site/ports.py
+++ b/tests/gold_tests/autest-site/ports.py
@@ -17,7 +17,7 @@
# limitations under the License.
import asyncio
-from typing import Set
+from typing import Generator, Set
import socket
import subprocess
import os
@@ -37,27 +37,45 @@ class AsyncPortQueue(OrderedSetQueue):
super().__init__()
self._listening_ports = _get_listening_ports()
- async def select_available(self, amount, dmin, dmax):
- rmin = dmin - 2000
- rmax = 65536 - dmax
+ async def select_available(self, amount: int, dmin: int, dmax: int) -> None:
+ '''Populate the port queue with ports that are not in use by the OS.
- port_tasks = []
- await asyncio.gather(*port_tasks)
- if rmax > amount:
+ This method fills in the queue with at least `amount` ports that are not
+ within `dmin` and `dmax`.
+
+ :param amount: The number of ports to populate the queue with.
+ :param dmin: The minimum port number we expect that the OS uses.
+ :param dmax: The maximum port number we expect that the OS uses.
+ '''
+ def task_generator(amount: int, dmin: int, dmax: int) -> Generator:
+ task_counter = 0
# Fill in ports, starting above the upper OS-usable port range.
port = dmax + 1
- while port < 65536 and self.qsize() < amount:
- port_tasks.append(self._check_port(port))
+ while port < 65536 and task_counter < amount:
+ yield self._check_port(port)
port += 1
- if rmin > amount and self.qsize() < amount:
- port = 2001
+ task_counter += 1
# Fill in more ports, starting at 2001, well above well known ports,
# and going up until the minimum port range used by the OS.
- while port < dmin and self.qsize() < amount:
- port_tasks.append(self._check_port(port))
+ port = 2001
+ while port < dmin and task_counter < amount:
+ yield self._check_port(port)
port += 1
-
- await asyncio.gather(*port_tasks)
+ task_counter += 1
+
+ tasks = task_generator(amount, dmin, dmax)
+ while self.qsize() < amount:
+ port_tasks = []
+ ports_still_needed = amount - self.qsize()
+ while len(port_tasks) < ports_still_needed:
+ try:
+ port_tasks.append(next(tasks))
+ except StopIteration:
+ # No more tasks to generate.
+ host.WriteWarning('Ran out of ports to check, stopping port queue setup.')
+ return
+ host.WriteDebug('_setup_port_queue', f"Gathering {len(port_tasks)} port tasks out of {ports_still_needed} desired.")
+ await asyncio.gather(*port_tasks)
async def _check_port(self, port):
if await self._is_port_open(port):
With this change:
╰─➤ time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed
Generating Report: --------------
Total of 1 test
Unknown: 0
Exception: 0
Failed: 0
Warning: 0
Skipped: 0
Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency 0.58s user 0.30s system 84% cpu 1.037 total
0.58 seconds is still more than 0.41 seconds (the amount of time before any of the patches on this branch), but the value is so close and small no one will notice it. Can you check that it sill makes your system's check efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have timed this patch. (The test failed because the plugin it needed wasn't compiled.)
Total of 1 test
Unknown: 0
Exception: 1
Failed: 0
Warning: 0
Skipped: 0
Passed: 0
real 2m20.733s
user 0m3.494s
sys 0m1.416s
It's fast enough that working on an AuTest is possible if you have a lot of patience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bneradt am I reading the code wrong, or will the generator not yield more than 1000 tasks, which may be too few to find 1000 open ports? Does the task_counter
in the generator need to be removed?
Chiming in late on this.. The goal is to secure all the necessary ports. ie, enough for a given test and enough buffer for ports to reset correctly with the system for various reasons. This is the main goal we have for getting the port set as we do. The speed concern is that we are getting each port serially. By "getting a port" I mean testing that the port is "safe". Is that correct? |
@dragon512 Thank you for asking for clarification. Your understanding of my concern is (mostly? see note) correct, but to avoid getting tunnel visioned it might be better to state the concern as "connecting to a Windows host port from WSL through the Python socket API to see if it is open has too high a runtime cost for our serial port searching algorithm." In this PR, Brian and I have worked on trying to change the algorithm. However, other approaches might be better; for example, not using WSL to run AuTests. I could also try to understand why the runtime cost is so large and possibly try to fix the issue in WSL itself. I didn't want to take that on initially, because it would probably be a great deal of work and require some WSL knowledge and expertise. Edit: One note. I am not sure whether you mean an async for loop. There is an |
On WSL, this change reduces the runtime for the AuTest setup from prohibitive to annoying. Most developers are not using WSL and could possibly see a performance degradation with this change.
Opening a connection to a host port is a very slow operation on WSL:
0.5s * 1000 ports = 500s
opening ports, not accounting for other overhead. In my PoC script, it took 20 seconds after this change - more than I hoped. I think this is due to not being able to set a 0.5 second timeout with the async socket interface like we do with the synchronous one. The actual change in this patch takes more than 20 seconds but is still sub-minute on my system.This needs testing to see how it performs in typical dev environments before it can be considered for merging.