Skip to content

Unstable performance of ldpreload+tcp tests #4106

@vrpolakatcisco

Description

@vrpolakatcisco

As seen in trending, the performance is not stable. Most runs are ok, but a minority of runs (frequency depends on testbed type) show decreased performance. After many verify runs with heavy usage of show session I think I can distinguish three different situations that can decrease performance, each with specific symptoms. I have examples but server and client side info had to be gathered in different runs, so counter values do not match. In each situation, the traffic is stalled for hundredths to tens of a second, way longer than Linux process pre-emption. I am not yet sure which situations are expected and which hint at a bug we can fix (in VPP, CSIT, infra or even iperf), I will open narrower tickets if I spot a specific bug to fix. Sorted roughly in decreasing impact (frequency and performance drop):

Situation 1: App not consuming

Receiver VPP log:

[1:1][T] 172.16.100.3:5201->172.16.100.2:51881              ESTABLISHED    
 index: 1 cfg: TSO off flags: Zero RWND sent timers: 
 snd_una 1 snd_nxt 1 rcv_nxt 699824598 rcv_las 699824598
 snd_wnd 3999744 rcv_wnd 0 rcv_wscale 10 snd_wl1 699823150 snd_wl2 1
 flight size 0 out space 4344 rcv_wnd_av 0 tsval_recent 24619
 tsecr 15204 tsecr_last_ack 15204 tsval_recent_age 7 snd_mss 1448
 rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 25390603
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
         cc space 4344 prev_cwnd 0 prev_ssthresh 0
         snd_cong 0 dupack 0 limited_tx 25390603
         rxt_bytes 0 rxt_delivered 0 rxt_head 25390603 rxt_ts 15211
         prr_start 25390603 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 25390603 is_reneging 0 reorder 3
         cur_rxt_hole 4294967295 high_rxt 25390603 rescue_rxt 25390603
 stats: in segs 3449448 dsegs 3449447 bytes 4994791893 dupacks 0
        out segs 32582 dsegs 0 bytes 0 dupacks 521
        fr 0 tr 0 rxt segs 0 bytes 0 duration 1.058
        err wnd data below 0 above 0 ack below 0 above 0
 pacer: rate 43440000 bucket 0 t/p 43.440 last_update .007 s burst 1460
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 3999376 nitems 4000000 has_event 1 min_alloc 65536
          head 695825221 tail 699824597 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 0 active elts newest 4294967295
 session node event: not found
 Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 0 active elts newest 0
 session: state: ready opaque: 0x0 flags: rx-ready

Sender VPP log:

[1:2][T] 172.16.100.2:2435->172.16.100.3:5201               ESTABLISHED    
 index: 2 cfg: TSO off flags: PSH pending timers: PERSIST
 snd_una 1411974846 snd_nxt 1411974846 rcv_nxt 1 rcv_las 1
 snd_wnd 0 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 1411974846
 flight size 0 out space 0 rcv_wnd_av 3999744 tsval_recent 15224
 tsecr 24642 tsecr_last_ack 24642 tsval_recent_age 2 snd_mss 1448
 rto 200 rto_boff 0 srtt .9 us .269 rttvar .1 rtt_ts 0.0000 rtt_seq 1411040886
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   none algo cubic cwnd 1508750 ssthresh 1381326 bytes_acked 20272
         cc space 0 prev_cwnd 1973323 prev_ssthresh 1971875
         snd_cong 38810782 dupack 0 limited_tx 38810782
         rxt_bytes 0 rxt_delivered 0 rxt_head 13880566 rxt_ts 24644
         prr_start 36840054 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 39065630 is_reneging 0 reorder 3
         cur_rxt_hole 4294967295 high_rxt 38628334 rescue_rxt 36840053
 stats: in segs 37179 dsegs 0 bytes 0 dupacks 266
        out segs 3941890 dsegs 3941888 bytes 5707800805 dupacks 0
        fr 3 tr 0 rxt segs 593 bytes 858664 duration 1.174
        err wnd data below 0 above 0 ack below 0 above 0
 pacer: rate 5609220589 bucket -51323 t/p 5609.221 last_update .002 s burst 62780
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 4000000 nitems 4000000 has_event 1 min_alloc 65536
          head 1411974845 tail 1415974845 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 session node event: not found
 session: state: ready opaque: 0x1 flags: rx-ready

Summary: Receiver Rx fifo is full (space less than MSS) but iperf3-server is not reading from it for some reason. Not sure what session node event: not found implies here. Zero window (logical when Rx fifo is full) visible on both VPPs explains why no traffic.

Situation 2: App not producing

Receiver VPP log:

[1:1][T] 172.16.90.3:5201->172.16.90.2:10527                ESTABLISHED    
 index: 1 cfg: TSO off flags:  timers: 
 snd_una 1 snd_nxt 1 rcv_nxt 2304840742 rcv_las 2304840742
 snd_wnd 3999744 rcv_wnd 3976192 rcv_wscale 10 snd_wl1 2304840118 snd_wl2 1
 flight size 0 out space 4344 rcv_wnd_av 3976192 tsval_recent 27169
 tsecr 17788 tsecr_last_ack 17788 tsval_recent_age 1 snd_mss 1448
 rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 1050116852
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
         cc space 4344 prev_cwnd 0 prev_ssthresh 0
         snd_cong 0 dupack 0 limited_tx 1050116852
         rxt_bytes 0 rxt_delivered 0 rxt_head 1050116852 rxt_ts 17789
         prr_start 1050116852 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 1050116852 is_reneging 0 reorder 3
         cur_rxt_hole 4294967295 high_rxt 1050116852 rescue_rxt 1050116852
 stats: in segs 10490160 dsegs 10490157 bytes 15189742629 dupacks 0
        out segs 96997 dsegs 0 bytes 0 dupacks 838
        fr 0 tr 0 rxt segs 0 bytes 0 duration 3.147
        err wnd data below 2 above 0 ack below 0 above 0
 pacer: rate 43440000 bucket 0 t/p 43.440 last_update .001 s burst 1460
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 2304840741 tail 2304840741 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 0 active elts newest 4294967295
 Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 0 active elts newest 0
 session: state: ready opaque: 0x0 flags: rx-ready

Sender VPP log:

[1:2][T] 172.16.100.2:37171->172.16.100.3:5201              ESTABLISHED    
 index: 2 cfg: TSO off flags: PSH pending timers: 
 snd_una 3551969806 snd_nxt 3551969806 rcv_nxt 1 rcv_las 1
 snd_wnd 3949568 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 3551969806
 flight size 0 out space 1672653 rcv_wnd_av 3999744 tsval_recent 16181
 tsecr 25603 tsecr_last_ack 25603 tsval_recent_age 8 snd_mss 1448
 rto 200 rto_boff 0 srtt .9 us .300 rttvar .1 rtt_ts 0.0000 rtt_seq 3550742726
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   none algo cubic cwnd 1672653 ssthresh 1164405 bytes_acked 356832
         cc space 1672653 prev_cwnd 1663436 prev_ssthresh 1168220
         snd_cong 2818309574 dupack 0 limited_tx 2333312110
         rxt_bytes 0 rxt_delivered 0 rxt_head 16304518 rxt_ts 25611
         prr_start 2816768902 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 2818338534 is_reneging 0 reorder 3
         cur_rxt_hole 4294967295 high_rxt 2816773246 rescue_rxt 2816768901
 stats: in segs 51043 dsegs 0 bytes 0 dupacks 552
        out segs 5419820 dsegs 5419818 bytes 7847894229 dupacks 0
        fr 15 tr 0 rxt segs 661 bytes 957128 duration 1.595
        err wnd data below 0 above 0 ack below 0 above 0
 pacer: rate 5573319669 bucket 0 t/p 5573.320 last_update .008 s burst 62780
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 3551969805 tail 3551969805 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 session: state: ready opaque: 0x1 flags: rx-ready

Summary: Sender Tx fifo is empty, but in normal operation iperf-client is able to keep it full. No line with session node event: is present, not sure what that means. Receiver has no information on why there is no traffic. Possibly the same cause that makes server not read in situation 1 might be causing client not sending here.

Situation 3: Retransmit lost

Receiver VPP log:

[1:1][T] 172.16.90.3:5201->172.16.90.2:47483                ESTABLISHED    
 index: 1 cfg: TSO off flags:  timers: 
 snd_una 1 snd_nxt 1 rcv_nxt 16424702 rcv_las 16424702
 snd_wnd 3999744 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 20421182 snd_wl2 1
 flight size 0 out space 4344 rcv_wnd_av 3999744 tsval_recent 24080
 tsecr 14777 tsecr_last_ack 14777 tsval_recent_age 14 snd_mss 1424
 rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 4241218523
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
         cc space 4344 prev_cwnd 0 prev_ssthresh 0
         snd_cong 0 dupack 0 limited_tx 4241218523
         rxt_bytes 0 rxt_delivered 0 rxt_head 4241218523 rxt_ts 14789
         prr_start 4241218523 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 4241218523 is_reneging 0 reorder 3
         cur_rxt_hole 4294967295 high_rxt 4241218523 rescue_rxt 4241218523
 stats: in segs 13876 dsegs 13875 bytes 20089589 dupacks 0
        out segs 296 dsegs 0 bytes 0 dupacks 63
        fr 0 tr 0 rxt segs 0 bytes 0 duration .020
        err wnd data below 0 above 0 ack below 0 above 0
 sacks tx:  start 19181694 end 20422630
 start 18657518 end 19164318
 start 16465246 end 18382398
 pacer: rate 43440000 bucket 0 t/p 43.440 last_update .012 s burst 1460
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 16424701 tail 16424701 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 3 active elts newest 4294967295
           [16465245, 18382397], len 1917152, next 6, prev -1
          [18657517, 19164317], len 506800, next 8, prev 0
          [19181693, 20422629], len 1240936, next -1, prev 6
 Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 2
          vpp session 1 thread 1 app session 2 thread 0
          ooo pool 0 active elts newest 0
 session: state: ready opaque: 0x0 flags: rx-ready

Sender VPP log:

[1:2][T] 172.16.100.2:40811->172.16.100.3:5201              ESTABLISHED    
 index: 2 cfg: TSO off flags: Recovery, PSH pending timers: RETRANSMIT
 snd_una 1630408462 snd_nxt 1630409286 rcv_nxt 1 rcv_las 1
 snd_wnd 0 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 1630408462
 flight size 824 out space 0 rcv_wnd_av 3999744 tsval_recent 15760
 tsecr 25195 tsecr_last_ack 24995 tsval_recent_age 8 snd_mss 1448
 rto 400 rto_boff 1 srtt .9 us .217 rttvar .2 rtt_ts 0.0000 rtt_seq 1630407838
 next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
 cong:   recovery algo cubic cwnd 3720 ssthresh 936292 bytes_acked 0
         cc space 0 prev_cwnd 1337561 prev_ssthresh 1160905
         snd_cong 1630409286 dupack 0 limited_tx 1170394990
         rxt_bytes 824 rxt_delivered 0 rxt_head 21223374 rxt_ts 8
         prr_start 1168857214 prr_delivered 0 prr space 0
 sboard: sacked 0 last_sacked 0 lost 824 last_lost 0 rxt_sacked 0
         last_delivered 0 high_sacked 1630408462 is_reneging 0 reorder 3
         cur_rxt_hole 4 high_rxt 1630409286 rescue_rxt 1630409285
         head 4 tail 4 1 holes:
           [1630408462, 1630409286]
 stats: in segs 39271 dsegs 0 bytes 0 dupacks 769
        out segs 4092794 dsegs 4092792 bytes 5926356253 dupacks 0
        fr 12 tr 1 rxt segs 677 bytes 979672 duration 1.703
        err wnd data below 0 above 0 ack below 0 above 0
 pacer: rate 6155297993 bucket 0 t/p 6155.298 last_update .008 s burst 61552
 transport: flags: tx_paced, descheduled
 Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
          head 0 tail 0 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 4000000 nitems 4000000 has_event 1 min_alloc 65536
          head 1630408461 tail 1634408461 segment manager 1
          vpp session 2 thread 1 app session 1 thread 0
          ooo pool 0 active elts newest 0
 session node event: not found
 session: state: ready opaque: 0x1 flags: rx-ready

Summary: Receiver sees out-of-order data, sender has holes in sboard, rxt_bytes is larger than rxt_delivered, the difference matches the holes. That means retransmits were sent, but did not arrive. Sender is in recovery mode (probably explaining snd_wnd 0) waiting for retransmit timer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions