Skip to content

Windows builds constantly failing with IOException: connection was forcibly closed by the remote host #4467

@avdv

Description

@avdv

We are using Bazel 6.2.0 currently, but the issue existed before (I tried to follow suggestions in #992 but no dice).

The issue only happens for the Windows builds, Linux and Darwin are fine.

ERROR: The Build Event Protocol upload failed: Not retrying publishBuildEvents, no more attempts left: status='Status{code=UNAVAILABLE, description=io exception, cause=java.io.IOException: An existing connection was forcibly closed by the remote host
	at java.base/sun.nio.ch.SocketDispatcher.read0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
}' UNAVAILABLE: An existing connection was forcibly closed by the remote host UNAVAILABLE: An existing connection was forcibly closed by the remote host
INFO: Build completed successfully, 511 total actions
Error: Process completed with exit code 38.

CI runs: 1 (bb), 2 (bb), 3 (bb)

Note, we have 4 jobs in the workflow running on Windows, and one of them succeeded: https://github.com/tweag/rules_haskell/actions/runs/5738410548/job/15564769980?pr=1925

The successful job took a bit over one hour, the failing jobs took >5 hours. Could this be just some sort of timeout?

We are using:

--bes_upload_mode=wait_for_upload_complete
--bes_timeout=60s

Should we try something else? Or tune different parameters? Thank you for any suggestions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions