Initialize AuTest port queue concurrently #12291

JosiahWI · 2025-06-14T13:36:10Z

On WSL, this change reduces the runtime for the AuTest setup from prohibitive to annoying. Most developers are not using WSL and could possibly see a performance degradation with this change.

Opening a connection to a host port is a very slow operation on WSL: 0.5s * 1000 ports = 500s opening ports, not accounting for other overhead. In my PoC script, it took 20 seconds after this change - more than I hoped. I think this is due to not being able to set a 0.5 second timeout with the async socket interface like we do with the synchronous one. The actual change in this patch takes more than 20 seconds but is still sub-minute on my system.

This needs testing to see how it performs in typical dev environments before it can be considered for merging.

On WSL, this change reduces the runtime for the AuTest setup from prohibitive (hours) to annoying (< 1 minute). Most developers are not using WSL, and could possibly see a performance degredation with this change. This needs testing to see how it performs in typical dev environments before it can be considered for merging.

bneradt · 2025-06-17T20:15:43Z

tests/gold_tests/autest-site/ports.py

+            host.WriteDebug(
+                'PortOpen', f"Connection to port {port} succeeded, the port is open, "
+                "and a future connection cannot use it")
+        except ConnectionRefusedError:


Running this locally, in a Rancher docker vm on my Apple silicon macbook pro, I get the following exception:

╰─➤ ./autest.sh --sandbox /tmp/sb --clean=none -f emergency 20:07:30 [4/186] Running Test emergency:E Generating Report: -------------- Test: emergency: Exception File: emergency.test.py Directory: /home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown Reason: Traceback (most recent call last): File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 246, in _setup_port_queue loop = asyncio.get_running_loop() RuntimeError: no running event loop During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/runtesttask.py", line 34, in __call__ tl = self.__logic.Run(self.__test) File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/runlogic.py", line 18, in Run if not tmp.Start(obj): ~~~~~~~~~^^^^^ File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/test.py", line 90, in Start loadTest(self.__test) ~~~~~~~~^^^^^^^^^^^^^ File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/test.py", line 292, in loadTest execFile(fileName, locals, locals) ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/common/execfile.py", line 16, in execFile exec(safeCompile(f.read(), fname), globals, locals) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown/emergency.test.py", line 25, in <module> ts = Test.MakeATSProcess('ts') File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/trafficserver.test.ext", line 348, in MakeATSProcess get_port(p, "port") ~~~~~~~~^^^^^^^^^^^ File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 282, in get_port _setup_port_queue() ~~~~~~~~~~~~~~~~~^^ File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 249, in _setup_port_queue asyncio.run(async_setup()) ~~~~~~~~~~~^^^^^^^^^^^^^^^ File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run return runner.run(main) ~~~~~~~~~~^^^^^^ File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^ File "/usr/lib64/python3.13/asyncio/base_events.py", line 719, in run_until_complete return future.result() ~~~~~~~~~~~~~^^ File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 243, in async_setup await g_ports.select_available(amount, dmin, dmax) File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 60, in select_available await asyncio.gather(*port_tasks) File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 63, in _check_port if await self._is_port_open(port): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 81, in _is_port_open reader, writer = await asyncio.open_connection(address, port, limit=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.13/asyncio/streams.py", line 48, in open_connection transport, _ = await loop.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ lambda: protocol, host, port, **kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.13/asyncio/base_events.py", line 1168, in create_connection raise OSError('Multiple exceptions: {}'.format( ', '.join(str(exc) for exc in exceptions))) OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 61001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 61001) Total of 1 test Unknown: 0 Exception: 1 Failed: 0 Warning: 0 Skipped: 0 Passed: 0

It seems like we need to add OSError to the list of handles exceptions:

diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py index 0b8ec00e5..c5259634c 100644 --- a/tests/gold_tests/autest-site/ports.py +++ b/tests/gold_tests/autest-site/ports.py @@ -85,7 +85,7 @@ class AsyncPortQueue(OrderedSetQueue): host.WriteDebug( 'PortOpen', f"Connection to port {port} succeeded, the port is open, " "and a future connection cannot use it") - except ConnectionRefusedError: + except (ConnectionRefusedError, OSError): host.WriteDebug( 'PortOpen', f"socket error for port {port}, port is closed, " "and therefore a future connection can use it")

bneradt

Running locally:

╰─➤  time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency                                                 
Running Test emergency:. Passed                      
                                                              
Generating Report: --------------                 
Total of 1 test                                                                                                              
  Unknown: 0                                            
  Exception: 0                                          
  Failed: 0                                                                                                                                                                                                                                               
  Warning: 0
  Skipped: 0   
  Passed: 1 
./autest.sh --sandbox /tmp/sb --clean=none -f emergency  0.41s user 0.26s system 82% cpu 0.819 total

Checking out your patch (and applying the patch I mentioned to handle OSError):

╰─➤  time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed

Generating Report: --------------
Total of 1 test
  Unknown: 0
  Exception: 0
  Failed: 0
  Warning: 0
  Skipped: 0
  Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency  8.06s user 6.60s system 144% cpu 10.154 total

So user time goes from about .5 seconds to 8 seconds with this change for some reason. I can try to investigate why...

bneradt · 2025-06-17T20:31:47Z

tests/gold_tests/autest-site/ports.py

+        rmax = 65536 - dmax
+
+        port_tasks = []
+        await asyncio.gather(*port_tasks)


Is this await here on an empty port_tasks intentional?

bneradt · 2025-06-17T21:09:09Z

tests/gold_tests/autest-site/ports.py

+        if rmax > amount:
+            # Fill in ports, starting above the upper OS-usable port range.
+            port = dmax + 1
+            while port < 65536 and self.qsize() < amount:
+                port_tasks.append(self._check_port(port))
+                port += 1
+        if rmin > amount and self.qsize() < amount:
+            port = 2001
+            # Fill in more ports, starting at 2001, well above well known ports,
+            # and going up until the minimum port range used by the OS.
+            while port < dmin and self.qsize() < amount:
+                port_tasks.append(self._check_port(port))
+                port += 1
+
+        await asyncio.gather(*port_tasks)


I believe we're doing more port checks than originally here. On my system, this adds about 35k or so port_tasks values here, each of which are done even though we only need 1k. Originally, we we essentially short circuit the loop once self.qsize() reaches amount (1000).

What do you think of this? It iteratively adds tasks until we get the desired amount of free ports:

diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py index c5259634c..951bcc3fe 100644 --- a/tests/gold_tests/autest-site/ports.py +++ b/tests/gold_tests/autest-site/ports.py @@ -17,7 +17,7 @@ # limitations under the License. import asyncio -from typing import Set +from typing import Generator, Set import socket import subprocess import os @@ -37,27 +37,45 @@ class AsyncPortQueue(OrderedSetQueue): super().__init__() self._listening_ports = _get_listening_ports() - async def select_available(self, amount, dmin, dmax): - rmin = dmin - 2000 - rmax = 65536 - dmax + async def select_available(self, amount: int, dmin: int, dmax: int) -> None: + '''Populate the port queue with ports that are not in use by the OS. - port_tasks = [] - await asyncio.gather(*port_tasks) - if rmax > amount: + This method fills in the queue with at least `amount` ports that are not + within `dmin` and `dmax`. + + :param amount: The number of ports to populate the queue with. + :param dmin: The minimum port number we expect that the OS uses. + :param dmax: The maximum port number we expect that the OS uses. + ''' + def task_generator(amount: int, dmin: int, dmax: int) -> Generator: + task_counter = 0 # Fill in ports, starting above the upper OS-usable port range. port = dmax + 1 - while port < 65536 and self.qsize() < amount: - port_tasks.append(self._check_port(port)) + while port < 65536 and task_counter < amount: + yield self._check_port(port) port += 1 - if rmin > amount and self.qsize() < amount: - port = 2001 + task_counter += 1 # Fill in more ports, starting at 2001, well above well known ports, # and going up until the minimum port range used by the OS. - while port < dmin and self.qsize() < amount: - port_tasks.append(self._check_port(port)) + port = 2001 + while port < dmin and task_counter < amount: + yield self._check_port(port) port += 1 - - await asyncio.gather(*port_tasks) + task_counter += 1 + + tasks = task_generator(amount, dmin, dmax) + while self.qsize() < amount: + port_tasks = [] + ports_still_needed = amount - self.qsize() + while len(port_tasks) < ports_still_needed: + try: + port_tasks.append(next(tasks)) + except StopIteration: + # No more tasks to generate. + host.WriteWarning('Ran out of ports to check, stopping port queue setup.') + return + host.WriteDebug('_setup_port_queue', f"Gathering {len(port_tasks)} port tasks out of {ports_still_needed} desired.") + await asyncio.gather(*port_tasks) async def _check_port(self, port): if await self._is_port_open(port):

With this change:

╰─➤ time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency Running Test emergency:. Passed Generating Report: -------------- Total of 1 test Unknown: 0 Exception: 0 Failed: 0 Warning: 0 Skipped: 0 Passed: 1 ./autest.sh --sandbox /tmp/sb --clean=none -f emergency 0.58s user 0.30s system 84% cpu 1.037 total

0.58 seconds is still more than 0.41 seconds (the amount of time before any of the patches on this branch), but the value is so close and small no one will notice it. Can you check that it sill makes your system's check efficient?

I have timed this patch. (The test failed because the plugin it needed wasn't compiled.)

Total of 1 test Unknown: 0 Exception: 1 Failed: 0 Warning: 0 Skipped: 0 Passed: 0 real 2m20.733s user 0m3.494s sys 0m1.416s

It's fast enough that working on an AuTest is possible if you have a lot of patience.

@bneradt am I reading the code wrong, or will the generator not yield more than 1000 tasks, which may be too few to find 1000 open ports? Does the task_counter in the generator need to be removed?

dragon512 · 2025-07-23T11:49:44Z

Chiming in late on this.. The goal is to secure all the necessary ports. ie, enough for a given test and enough buffer for ports to reset correctly with the system for various reasons. This is the main goal we have for getting the port set as we do.

The speed concern is that we are getting each port serially. By "getting a port" I mean testing that the port is "safe".
The goal here I believe is that you did testing to see that having this do an async for loop to get the ports was faster.

Is that correct?

JosiahWI · 2025-07-24T01:31:28Z

@dragon512 Thank you for asking for clarification. Your understanding of my concern is (mostly? see note) correct, but to avoid getting tunnel visioned it might be better to state the concern as "connecting to a Windows host port from WSL through the Python socket API to see if it is open has too high a runtime cost for our serial port searching algorithm."

In this PR, Brian and I have worked on trying to change the algorithm. However, other approaches might be better; for example, not using WSL to run AuTests. I could also try to understand why the runtime cost is so large and possibly try to fix the issue in WSL itself. I didn't want to take that on initially, because it would probably be a great deal of work and require some WSL knowledge and expertise.

Edit: One note. I am not sure whether you mean an async for loop. There is an async for construct in Python, as you probably know, which we haven't used. We are spawning an asynchronous task to check each port without blocking until a task completes. While waiting for the IO for one port, we want to do IO for another port. An async for loop does not run the elements of its iterable concurrently and thus doesn't accomplish this goal.

JosiahWI added this to the 10.2.0 milestone Jun 14, 2025

JosiahWI self-assigned this Jun 14, 2025

JosiahWI added the AuTest label Jun 14, 2025

bryancall requested a review from bneradt June 16, 2025 22:27

bneradt requested changes Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initialize AuTest port queue concurrently #12291

Initialize AuTest port queue concurrently #12291

Uh oh!

JosiahWI commented Jun 14, 2025 •

edited

Loading

Uh oh!

bneradt Jun 17, 2025

Uh oh!

bneradt left a comment

Uh oh!

bneradt Jun 17, 2025

Uh oh!

JosiahWI Jun 18, 2025

Uh oh!

bneradt Jun 17, 2025

Uh oh!

bneradt Jun 17, 2025 •

edited

Loading

Uh oh!

JosiahWI Jun 28, 2025

Uh oh!

JosiahWI Jun 28, 2025

Uh oh!

dragon512 commented Jul 23, 2025

Uh oh!

JosiahWI commented Jul 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Initialize AuTest port queue concurrently #12291

Are you sure you want to change the base?

Initialize AuTest port queue concurrently #12291

Uh oh!

Conversation

JosiahWI commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bneradt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

bneradt left a comment

Choose a reason for hiding this comment

Uh oh!

bneradt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

JosiahWI Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

bneradt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

bneradt Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JosiahWI Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

JosiahWI Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

dragon512 commented Jul 23, 2025

Uh oh!

JosiahWI commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JosiahWI commented Jun 14, 2025 •

edited

Loading

bneradt Jun 17, 2025 •

edited

Loading

JosiahWI commented Jul 24, 2025 •

edited

Loading