Skip to content

bug: dynamic upstream, one Eureka node is unavailable, half of the requests are lost after reloading #12610

@liquanzhou

Description

@liquanzhou

Current Behavior

为了实现动态upstream, 配置了两个eureka的ip地址
Image

限制了一个eureka节点的访问, 持续请求无问题
Image

如果重载了apisix,就会出现一半请求失败,稳定复现
Image

discovery:
eureka:
host:
- "http://10.250.200.99:8761"
- "http://10.250.200.98:8761"
prefix: "/eureka/"
fetch_interval: 30 # 30s
weight: 100 # default weight for node
timeout:
connect: 2000 # 2000ms
send: 2000 # 2000ms
read: 5000 # 5000ms

看起来受这个参数影响,如果设置抓取时间很短, 重载apisix后,会较快恢复
fetch_interval: 30

即使重载后,很短暂的几秒请求丢失, 对于nginx这种最重要的流量入口,也是不可接受的,所以希望能优化一下:

重载或重启时候,一个节点可以连接,一个节点不可用连接,因为两个eureka节点数据一致, 就可以拿到全量动态upstream的服务列表, 不能因为一个eureka节点连接不上,就导致请求一半失败!

Expected Behavior

No response

Error Logs

No response

Steps to Reproduce

1.apisix配置注册中心eureka,两个节点
discovery:
eureka:
host:
- "http://10.250.200.99:8761"
- "http://10.250.200.98:8761"
prefix: "/eureka/"
fetch_interval: 3 # 30s
weight: 100 # default weight for node
timeout:
connect: 2000 # 2000ms
send: 2000 # 2000ms
read: 2000 # 5000ms

2.在一个eureka节点主机上禁用掉apisix的ip所有请求
iptables -A INPUT -s 10.250.200.202 -j DROP

3.持续curl请求,正常

4.systemctl reload apisix

5.持续curl请求,请求有一半失败

Environment

  • APISIX version (run apisix version):
  • Operating system (run uname -a):
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

🏗 In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions