-
Notifications
You must be signed in to change notification settings - Fork 17
Description
As seen in trending, the performance is not stable. Most runs are ok, but a minority of runs (frequency depends on testbed type) show decreased performance. After many verify runs with heavy usage of show session I think I can distinguish three different situations that can decrease performance, each with specific symptoms. I have examples but server and client side info had to be gathered in different runs, so counter values do not match. In each situation, the traffic is stalled for hundredths to tens of a second, way longer than Linux process pre-emption. I am not yet sure which situations are expected and which hint at a bug we can fix (in VPP, CSIT, infra or even iperf), I will open narrower tickets if I spot a specific bug to fix. Sorted roughly in decreasing impact (frequency and performance drop):
Situation 1: App not consuming
Receiver VPP log:
[1:1][T] 172.16.100.3:5201->172.16.100.2:51881 ESTABLISHED
index: 1 cfg: TSO off flags: Zero RWND sent timers:
snd_una 1 snd_nxt 1 rcv_nxt 699824598 rcv_las 699824598
snd_wnd 3999744 rcv_wnd 0 rcv_wscale 10 snd_wl1 699823150 snd_wl2 1
flight size 0 out space 4344 rcv_wnd_av 0 tsval_recent 24619
tsecr 15204 tsecr_last_ack 15204 tsval_recent_age 7 snd_mss 1448
rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 25390603
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
cc space 4344 prev_cwnd 0 prev_ssthresh 0
snd_cong 0 dupack 0 limited_tx 25390603
rxt_bytes 0 rxt_delivered 0 rxt_head 25390603 rxt_ts 15211
prr_start 25390603 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 25390603 is_reneging 0 reorder 3
cur_rxt_hole 4294967295 high_rxt 25390603 rescue_rxt 25390603
stats: in segs 3449448 dsegs 3449447 bytes 4994791893 dupacks 0
out segs 32582 dsegs 0 bytes 0 dupacks 521
fr 0 tr 0 rxt segs 0 bytes 0 duration 1.058
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 43440000 bucket 0 t/p 43.440 last_update .007 s burst 1460
transport: flags: tx_paced, descheduled
Rx fifo: cursize 3999376 nitems 4000000 has_event 1 min_alloc 65536
head 695825221 tail 699824597 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 0 active elts newest 4294967295
session node event: not found
Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 0 active elts newest 0
session: state: ready opaque: 0x0 flags: rx-ready
Sender VPP log:
[1:2][T] 172.16.100.2:2435->172.16.100.3:5201 ESTABLISHED
index: 2 cfg: TSO off flags: PSH pending timers: PERSIST
snd_una 1411974846 snd_nxt 1411974846 rcv_nxt 1 rcv_las 1
snd_wnd 0 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 1411974846
flight size 0 out space 0 rcv_wnd_av 3999744 tsval_recent 15224
tsecr 24642 tsecr_last_ack 24642 tsval_recent_age 2 snd_mss 1448
rto 200 rto_boff 0 srtt .9 us .269 rttvar .1 rtt_ts 0.0000 rtt_seq 1411040886
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: none algo cubic cwnd 1508750 ssthresh 1381326 bytes_acked 20272
cc space 0 prev_cwnd 1973323 prev_ssthresh 1971875
snd_cong 38810782 dupack 0 limited_tx 38810782
rxt_bytes 0 rxt_delivered 0 rxt_head 13880566 rxt_ts 24644
prr_start 36840054 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 39065630 is_reneging 0 reorder 3
cur_rxt_hole 4294967295 high_rxt 38628334 rescue_rxt 36840053
stats: in segs 37179 dsegs 0 bytes 0 dupacks 266
out segs 3941890 dsegs 3941888 bytes 5707800805 dupacks 0
fr 3 tr 0 rxt segs 593 bytes 858664 duration 1.174
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 5609220589 bucket -51323 t/p 5609.221 last_update .002 s burst 62780
transport: flags: tx_paced, descheduled
Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 4000000 nitems 4000000 has_event 1 min_alloc 65536
head 1411974845 tail 1415974845 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
session node event: not found
session: state: ready opaque: 0x1 flags: rx-ready
Summary: Receiver Rx fifo is full (space less than MSS) but iperf3-server is not reading from it for some reason. Not sure what session node event: not found implies here. Zero window (logical when Rx fifo is full) visible on both VPPs explains why no traffic.
Situation 2: App not producing
Receiver VPP log:
[1:1][T] 172.16.90.3:5201->172.16.90.2:10527 ESTABLISHED
index: 1 cfg: TSO off flags: timers:
snd_una 1 snd_nxt 1 rcv_nxt 2304840742 rcv_las 2304840742
snd_wnd 3999744 rcv_wnd 3976192 rcv_wscale 10 snd_wl1 2304840118 snd_wl2 1
flight size 0 out space 4344 rcv_wnd_av 3976192 tsval_recent 27169
tsecr 17788 tsecr_last_ack 17788 tsval_recent_age 1 snd_mss 1448
rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 1050116852
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
cc space 4344 prev_cwnd 0 prev_ssthresh 0
snd_cong 0 dupack 0 limited_tx 1050116852
rxt_bytes 0 rxt_delivered 0 rxt_head 1050116852 rxt_ts 17789
prr_start 1050116852 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 1050116852 is_reneging 0 reorder 3
cur_rxt_hole 4294967295 high_rxt 1050116852 rescue_rxt 1050116852
stats: in segs 10490160 dsegs 10490157 bytes 15189742629 dupacks 0
out segs 96997 dsegs 0 bytes 0 dupacks 838
fr 0 tr 0 rxt segs 0 bytes 0 duration 3.147
err wnd data below 2 above 0 ack below 0 above 0
pacer: rate 43440000 bucket 0 t/p 43.440 last_update .001 s burst 1460
transport: flags: tx_paced, descheduled
Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 2304840741 tail 2304840741 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 0 active elts newest 4294967295
Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 0 active elts newest 0
session: state: ready opaque: 0x0 flags: rx-ready
Sender VPP log:
[1:2][T] 172.16.100.2:37171->172.16.100.3:5201 ESTABLISHED
index: 2 cfg: TSO off flags: PSH pending timers:
snd_una 3551969806 snd_nxt 3551969806 rcv_nxt 1 rcv_las 1
snd_wnd 3949568 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 3551969806
flight size 0 out space 1672653 rcv_wnd_av 3999744 tsval_recent 16181
tsecr 25603 tsecr_last_ack 25603 tsval_recent_age 8 snd_mss 1448
rto 200 rto_boff 0 srtt .9 us .300 rttvar .1 rtt_ts 0.0000 rtt_seq 3550742726
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: none algo cubic cwnd 1672653 ssthresh 1164405 bytes_acked 356832
cc space 1672653 prev_cwnd 1663436 prev_ssthresh 1168220
snd_cong 2818309574 dupack 0 limited_tx 2333312110
rxt_bytes 0 rxt_delivered 0 rxt_head 16304518 rxt_ts 25611
prr_start 2816768902 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 2818338534 is_reneging 0 reorder 3
cur_rxt_hole 4294967295 high_rxt 2816773246 rescue_rxt 2816768901
stats: in segs 51043 dsegs 0 bytes 0 dupacks 552
out segs 5419820 dsegs 5419818 bytes 7847894229 dupacks 0
fr 15 tr 0 rxt segs 661 bytes 957128 duration 1.595
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 5573319669 bucket 0 t/p 5573.320 last_update .008 s burst 62780
transport: flags: tx_paced, descheduled
Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 3551969805 tail 3551969805 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
session: state: ready opaque: 0x1 flags: rx-ready
Summary: Sender Tx fifo is empty, but in normal operation iperf-client is able to keep it full. No line with session node event: is present, not sure what that means. Receiver has no information on why there is no traffic. Possibly the same cause that makes server not read in situation 1 might be causing client not sending here.
Situation 3: Retransmit lost
Receiver VPP log:
[1:1][T] 172.16.90.3:5201->172.16.90.2:47483 ESTABLISHED
index: 1 cfg: TSO off flags: timers:
snd_una 1 snd_nxt 1 rcv_nxt 16424702 rcv_las 16424702
snd_wnd 3999744 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 20421182 snd_wl2 1
flight size 0 out space 4344 rcv_wnd_av 3999744 tsval_recent 24080
tsecr 14777 tsecr_last_ack 14777 tsval_recent_age 14 snd_mss 1424
rto 200 rto_boff 0 srtt .1 us .100 rttvar .1 rtt_ts 0.0000 rtt_seq 4241218523
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: none algo cubic cwnd 4344 ssthresh 2147483647 bytes_acked 0
cc space 4344 prev_cwnd 0 prev_ssthresh 0
snd_cong 0 dupack 0 limited_tx 4241218523
rxt_bytes 0 rxt_delivered 0 rxt_head 4241218523 rxt_ts 14789
prr_start 4241218523 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 0 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 4241218523 is_reneging 0 reorder 3
cur_rxt_hole 4294967295 high_rxt 4241218523 rescue_rxt 4241218523
stats: in segs 13876 dsegs 13875 bytes 20089589 dupacks 0
out segs 296 dsegs 0 bytes 0 dupacks 63
fr 0 tr 0 rxt segs 0 bytes 0 duration .020
err wnd data below 0 above 0 ack below 0 above 0
sacks tx: start 19181694 end 20422630
start 18657518 end 19164318
start 16465246 end 18382398
pacer: rate 43440000 bucket 0 t/p 43.440 last_update .012 s burst 1460
transport: flags: tx_paced, descheduled
Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 16424701 tail 16424701 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 3 active elts newest 4294967295
[16465245, 18382397], len 1917152, next 6, prev -1
[18657517, 19164317], len 506800, next 8, prev 0
[19181693, 20422629], len 1240936, next -1, prev 6
Tx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 2
vpp session 1 thread 1 app session 2 thread 0
ooo pool 0 active elts newest 0
session: state: ready opaque: 0x0 flags: rx-ready
Sender VPP log:
[1:2][T] 172.16.100.2:40811->172.16.100.3:5201 ESTABLISHED
index: 2 cfg: TSO off flags: Recovery, PSH pending timers: RETRANSMIT
snd_una 1630408462 snd_nxt 1630409286 rcv_nxt 1 rcv_las 1
snd_wnd 0 rcv_wnd 3999744 rcv_wscale 10 snd_wl1 1 snd_wl2 1630408462
flight size 824 out space 0 rcv_wnd_av 3999744 tsval_recent 15760
tsecr 25195 tsecr_last_ack 24995 tsval_recent_age 8 snd_mss 1448
rto 400 rto_boff 1 srtt .9 us .217 rttvar .2 rtt_ts 0.0000 rtt_seq 1630407838
next_node 0 opaque 0x0 fib_index 0 sw_if_index 1
cong: recovery algo cubic cwnd 3720 ssthresh 936292 bytes_acked 0
cc space 0 prev_cwnd 1337561 prev_ssthresh 1160905
snd_cong 1630409286 dupack 0 limited_tx 1170394990
rxt_bytes 824 rxt_delivered 0 rxt_head 21223374 rxt_ts 8
prr_start 1168857214 prr_delivered 0 prr space 0
sboard: sacked 0 last_sacked 0 lost 824 last_lost 0 rxt_sacked 0
last_delivered 0 high_sacked 1630408462 is_reneging 0 reorder 3
cur_rxt_hole 4 high_rxt 1630409286 rescue_rxt 1630409285
head 4 tail 4 1 holes:
[1630408462, 1630409286]
stats: in segs 39271 dsegs 0 bytes 0 dupacks 769
out segs 4092794 dsegs 4092792 bytes 5926356253 dupacks 0
fr 12 tr 1 rxt segs 677 bytes 979672 duration 1.703
err wnd data below 0 above 0 ack below 0 above 0
pacer: rate 6155297993 bucket 0 t/p 6155.298 last_update .008 s burst 61552
transport: flags: tx_paced, descheduled
Rx fifo: cursize 0 nitems 4000000 has_event 0 min_alloc 65536
head 0 tail 0 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
Tx fifo: cursize 4000000 nitems 4000000 has_event 1 min_alloc 65536
head 1630408461 tail 1634408461 segment manager 1
vpp session 2 thread 1 app session 1 thread 0
ooo pool 0 active elts newest 0
session node event: not found
session: state: ready opaque: 0x1 flags: rx-ready
Summary: Receiver sees out-of-order data, sender has holes in sboard, rxt_bytes is larger than rxt_delivered, the difference matches the holes. That means retransmits were sent, but did not arrive. Sender is in recovery mode (probably explaining snd_wnd 0) waiting for retransmit timer.