Skip to content

Conversation

JosiahWI
Copy link
Contributor

@JosiahWI JosiahWI commented Jun 14, 2025

On WSL, this change reduces the runtime for the AuTest setup from prohibitive to annoying. Most developers are not using WSL and could possibly see a performance degradation with this change.

Opening a connection to a host port is a very slow operation on WSL: 0.5s * 1000 ports = 500s opening ports, not accounting for other overhead. In my PoC script, it took 20 seconds after this change - more than I hoped. I think this is due to not being able to set a 0.5 second timeout with the async socket interface like we do with the synchronous one. The actual change in this patch takes more than 20 seconds but is still sub-minute on my system.

This needs testing to see how it performs in typical dev environments before it can be considered for merging.

On WSL, this change reduces the runtime for the AuTest setup from
prohibitive (hours) to annoying (< 1 minute). Most developers are
not using WSL, and could possibly see a performance degredation
with this change.

This needs testing to see how it performs in typical dev environments
before it can be considered for merging.
@JosiahWI JosiahWI added this to the 10.2.0 milestone Jun 14, 2025
@JosiahWI JosiahWI self-assigned this Jun 14, 2025
@bryancall bryancall requested a review from bneradt June 16, 2025 22:27
host.WriteDebug(
'PortOpen', f"Connection to port {port} succeeded, the port is open, "
"and a future connection cannot use it")
except ConnectionRefusedError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this locally, in a Rancher docker vm on my Apple silicon macbook pro, I get the following exception:

╰─➤  ./autest.sh --sandbox /tmp/sb --clean=none -f emergency                                                                                                                                                                              20:07:30 [4/186]
Running Test emergency:E                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                          
Generating Report: --------------                                                                                                                                                                                                                         
 Test: emergency: Exception                                                                                                                                                                                                                               
    File: emergency.test.py                                   
    Directory: /home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown                                          
     Reason: Traceback (most recent call last):                                                                              
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 246, in _setup_port_queue                                                                                                                            
         loop = asyncio.get_running_loop()                    
     RuntimeError: no running event loop                      
                                                              
     During handling of the above exception, another exception occurred:                                                     
                                                              
     Traceback (most recent call last):                       
       File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/runtesttask.py", line 34, in __call__                                                                                                         
         tl = self.__logic.Run(self.__test)                   
       File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/runlogic.py", line 18, in Run                                                                                                             
         if not tmp.Start(obj):                               
                ~~~~~~~~~^^^^^                                
       File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/runlogic/test.py", line 90, in Start                                                                                                               
         loadTest(self.__test)                                
         ~~~~~~~~^^^^^^^^^^^^^                                
       File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/core/test.py", line 292, in loadTest                                                                                                               
         execFile(fileName, locals, locals)                   
         ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^                   
       File "/home/bneradt/.local/share/virtualenvs/tests-68jNU4xi/lib/python3.13/site-packages/autest/common/execfile.py", line 16, in execFile                                                                                                          
         exec(safeCompile(f.read(), fname), globals, locals)                                                                 
         ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                 
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/shutdown/emergency.test.py", line 25, in <module>                                                                                                                                
         ts = Test.MakeATSProcess('ts')                       
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/trafficserver.test.ext", line 348, in MakeATSProcess                                                                                                                 
         get_port(p, "port")                                  
         ~~~~~~~~^^^^^^^^^^^                                  
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 282, in get_port                                                                                                                                     
         _setup_port_queue()                                  
         ~~~~~~~~~~~~~~~~~^^                                  
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 249, in _setup_port_queue                                                                                                                            
         asyncio.run(async_setup())                           
         ~~~~~~~~~~~^^^^^^^^^^^^^^^                           
       File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run                                                     
         return runner.run(main)                              
                ~~~~~~~~~~^^^^^^                              
       File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run                                                     
         return self._loop.run_until_complete(task)                                                                          
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^                                                                          
       File "/usr/lib64/python3.13/asyncio/base_events.py", line 719, in run_until_complete                                                                                                                                                               
         return future.result()                               
                ~~~~~~~~~~~~~^^                               
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 243, in async_setup                                                                                                                                  
         await g_ports.select_available(amount, dmin, dmax)                                                                  
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 60, in select_available                                                                                                                              
         await asyncio.gather(*port_tasks)                    
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 63, in _check_port                                                                                                                                   
         if await self._is_port_open(port):                   
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
       File "/home/bneradt/src/ts_asf_master_pr_reviews/tests/gold_tests/autest-site/ports.py", line 81, in _is_port_open                                                                                                                                 
         reader, writer = await asyncio.open_connection(address, port, limit=1)                                              
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                              
       File "/usr/lib64/python3.13/asyncio/streams.py", line 48, in open_connection                                          
         transport, _ = await loop.create_connection(                                                                        
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                        
             lambda: protocol, host, port, **kwds)                                                                           
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                           
       File "/usr/lib64/python3.13/asyncio/base_events.py", line 1168, in create_connection                                                                                                                                                               
         raise OSError('Multiple exceptions: {}'.format(                                                                     
             ', '.join(str(exc) for exc in exceptions)))                                                                     
     OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 61001, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 61001)                                                                                                             

Total of 1 test                                               
  Unknown: 0                                                  
  Exception: 1                                                
  Failed: 0                                                   
  Warning: 0                                                  
  Skipped: 0       
  Passed: 0                                           

It seems like we need to add OSError to the list of handles exceptions:

diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py
index 0b8ec00e5..c5259634c 100644
--- a/tests/gold_tests/autest-site/ports.py
+++ b/tests/gold_tests/autest-site/ports.py
@@ -85,7 +85,7 @@ class AsyncPortQueue(OrderedSetQueue):
             host.WriteDebug(
                 'PortOpen', f"Connection to port {port} succeeded, the port is open, "
                 "and a future connection cannot use it")
-        except ConnectionRefusedError:
+        except (ConnectionRefusedError, OSError):
             host.WriteDebug(
                 'PortOpen', f"socket error for port {port}, port is closed, "
                 "and therefore a future connection can use it")

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running locally:

╰─➤  time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency                                                 
Running Test emergency:. Passed                      
                                                              
Generating Report: --------------                 
Total of 1 test                                                                                                              
  Unknown: 0                                            
  Exception: 0                                          
  Failed: 0                                                                                                                                                                                                                                               
  Warning: 0
  Skipped: 0   
  Passed: 1 
./autest.sh --sandbox /tmp/sb --clean=none -f emergency  0.41s user 0.26s system 82% cpu 0.819 total

Checking out your patch (and applying the patch I mentioned to handle OSError):

╰─➤  time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed

Generating Report: --------------
Total of 1 test
  Unknown: 0
  Exception: 0
  Failed: 0
  Warning: 0
  Skipped: 0
  Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency  8.06s user 6.60s system 144% cpu 10.154 total

So user time goes from about .5 seconds to 8 seconds with this change for some reason. I can try to investigate why...

rmax = 65536 - dmax

port_tasks = []
await asyncio.gather(*port_tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this await here on an empty port_tasks intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope.

Comment on lines +46 to +60
if rmax > amount:
# Fill in ports, starting above the upper OS-usable port range.
port = dmax + 1
while port < 65536 and self.qsize() < amount:
port_tasks.append(self._check_port(port))
port += 1
if rmin > amount and self.qsize() < amount:
port = 2001
# Fill in more ports, starting at 2001, well above well known ports,
# and going up until the minimum port range used by the OS.
while port < dmin and self.qsize() < amount:
port_tasks.append(self._check_port(port))
port += 1

await asyncio.gather(*port_tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're doing more port checks than originally here. On my system, this adds about 35k or so port_tasks values here, each of which are done even though we only need 1k. Originally, we we essentially short circuit the loop once self.qsize() reaches amount (1000).

Copy link
Contributor

@bneradt bneradt Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this? It iteratively adds tasks until we get the desired amount of free ports:

diff --git a/tests/gold_tests/autest-site/ports.py b/tests/gold_tests/autest-site/ports.py
index c5259634c..951bcc3fe 100644
--- a/tests/gold_tests/autest-site/ports.py
+++ b/tests/gold_tests/autest-site/ports.py
@@ -17,7 +17,7 @@
 #  limitations under the License.
 
 import asyncio
-from typing import Set
+from typing import Generator, Set
 import socket
 import subprocess
 import os
@@ -37,27 +37,45 @@ class AsyncPortQueue(OrderedSetQueue):
         super().__init__()
         self._listening_ports = _get_listening_ports()
 
-    async def select_available(self, amount, dmin, dmax):
-        rmin = dmin - 2000
-        rmax = 65536 - dmax
+    async def select_available(self, amount: int, dmin: int, dmax: int) -> None:
+        '''Populate the port queue with ports that are not in use by the OS.
 
-        port_tasks = []
-        await asyncio.gather(*port_tasks)
-        if rmax > amount:
+        This method fills in the queue with at least `amount` ports that are not
+        within `dmin` and `dmax`.
+
+        :param amount: The number of ports to populate the queue with.
+        :param dmin: The minimum port number we expect that the OS uses.
+        :param dmax: The maximum port number we expect that the OS uses.
+        '''
+        def task_generator(amount: int, dmin: int, dmax: int) -> Generator:
+            task_counter = 0
             # Fill in ports, starting above the upper OS-usable port range.
             port = dmax + 1
-            while port < 65536 and self.qsize() < amount:
-                port_tasks.append(self._check_port(port))
+            while port < 65536 and task_counter < amount:
+                yield self._check_port(port)
                 port += 1
-        if rmin > amount and self.qsize() < amount:
-            port = 2001
+                task_counter += 1
             # Fill in more ports, starting at 2001, well above well known ports,
             # and going up until the minimum port range used by the OS.
-            while port < dmin and self.qsize() < amount:
-                port_tasks.append(self._check_port(port))
+            port = 2001
+            while port < dmin and task_counter < amount:
+                yield self._check_port(port)
                 port += 1
-
-        await asyncio.gather(*port_tasks)
+                task_counter += 1
+
+        tasks = task_generator(amount, dmin, dmax)
+        while self.qsize() < amount:
+            port_tasks = []
+            ports_still_needed = amount - self.qsize()
+            while len(port_tasks) < ports_still_needed:
+                try:
+                    port_tasks.append(next(tasks))
+                except StopIteration:
+                    # No more tasks to generate.
+                    host.WriteWarning('Ran out of ports to check, stopping port queue setup.')
+                    return
+            host.WriteDebug('_setup_port_queue', f"Gathering {len(port_tasks)} port tasks out of {ports_still_needed} desired.")
+            await asyncio.gather(*port_tasks)
 
     async def _check_port(self, port):
         if await self._is_port_open(port):

With this change:

╰─➤  time ./autest.sh --sandbox /tmp/sb --clean=none -f emergency
Running Test emergency:. Passed

Generating Report: --------------
Total of 1 test
  Unknown: 0
  Exception: 0
  Failed: 0
  Warning: 0
  Skipped: 0
  Passed: 1
./autest.sh --sandbox /tmp/sb --clean=none -f emergency  0.58s user 0.30s system 84% cpu 1.037 total

0.58 seconds is still more than 0.41 seconds (the amount of time before any of the patches on this branch), but the value is so close and small no one will notice it. Can you check that it sill makes your system's check efficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have timed this patch. (The test failed because the plugin it needed wasn't compiled.)

Total of 1 test
  Unknown: 0
  Exception: 1
  Failed: 0
  Warning: 0
  Skipped: 0
  Passed: 0

real    2m20.733s
user    0m3.494s
sys     0m1.416s

It's fast enough that working on an AuTest is possible if you have a lot of patience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bneradt am I reading the code wrong, or will the generator not yield more than 1000 tasks, which may be too few to find 1000 open ports? Does the task_counter in the generator need to be removed?

@dragon512
Copy link
Contributor

Chiming in late on this.. The goal is to secure all the necessary ports. ie, enough for a given test and enough buffer for ports to reset correctly with the system for various reasons. This is the main goal we have for getting the port set as we do.

The speed concern is that we are getting each port serially. By "getting a port" I mean testing that the port is "safe".
The goal here I believe is that you did testing to see that having this do an async for loop to get the ports was faster.

Is that correct?

@JosiahWI
Copy link
Contributor Author

JosiahWI commented Jul 24, 2025

@dragon512 Thank you for asking for clarification. Your understanding of my concern is (mostly? see note) correct, but to avoid getting tunnel visioned it might be better to state the concern as "connecting to a Windows host port from WSL through the Python socket API to see if it is open has too high a runtime cost for our serial port searching algorithm."

In this PR, Brian and I have worked on trying to change the algorithm. However, other approaches might be better; for example, not using WSL to run AuTests. I could also try to understand why the runtime cost is so large and possibly try to fix the issue in WSL itself. I didn't want to take that on initially, because it would probably be a great deal of work and require some WSL knowledge and expertise.

Edit: One note. I am not sure whether you mean an async for loop. There is an async for construct in Python, as you probably know, which we haven't used. We are spawning an asynchronous task to check each port without blocking until a task completes. While waiting for the IO for one port, we want to do IO for another port. An async for loop does not run the elements of its iterable concurrently and thus doesn't accomplish this goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants