Skip to content

add websocket keepalive#1708

Open
terassyi wants to merge 9 commits intomainfrom
add-websocket-keepalive
Open

add websocket keepalive#1708
terassyi wants to merge 9 commits intomainfrom
add-websocket-keepalive

Conversation

@terassyi
Copy link
Copy Markdown
Contributor

  • Add websocket server/client
  • Add metrics
  • Add README
  • Add maintenance procedures and image building parts

@terassyi terassyi requested a review from tkna August 18, 2025 05:23
@terassyi terassyi marked this pull request as ready for review August 18, 2025 06:01
@terassyi terassyi force-pushed the add-websocket-keepalive branch from 750897e to 2c3bfb2 Compare August 18, 2025 06:03
Comment thread websocket-keepalive/internal/server/server.go Outdated
Comment thread websocket-keepalive/README.md
Comment thread websocket-keepalive/internal/server/server.go Outdated
Comment thread websocket-keepalive/internal/server/server.go Outdated
Comment thread websocket-keepalive/internal/server/server.go Outdated
Comment thread websocket-keepalive/internal/client/client.go Outdated
}
if metricsConfig.Export {
slog.Info("Start metrics server", "listen", m.AddrPort)
go func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this goroutine never ends.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to infinitely try to start metrics server, I found.
So, when it fails to start it , I changed to make this panic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as server-side.

Comment thread websocket-keepalive/internal/client/client.go Outdated
Comment thread websocket-keepalive/internal/client/metrics.go Outdated
Comment thread websocket-keepalive/internal/common/utils.go
Comment thread websocket-keepalive/internal/client/client.go Outdated
@terassyi terassyi force-pushed the add-websocket-keepalive branch from 2c3bfb2 to 1a9a48f Compare September 1, 2025 10:19
@terassyi terassyi requested a review from tkna September 1, 2025 10:20
@terassyi
Copy link
Copy Markdown
Contributor Author

terassyi commented Sep 1, 2025

@tkna
Thank you for your review!
I'm sorry, but I rewrite almost part.
To do so, I reflected your review in new codes.
Please review again.🙇‍♂️

Comment on lines +98 to +103
go func() {
if err := m.Metrics.Serve(); err != nil {
slog.Error("metrics server failed", "error", err)
// When metrics server doesn't run correctly, program shouldn't do any more. Just panic here.
panic(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we have 2 web servers (metrics server / websocket server) and we have to shutdown both (in the case of both normal and abnormal closure).
Using errgroup is a good idea for this.
ref: https://github.com/cybozu-private/sphinx/blob/master/cmd/sphinx/main.go
but the implementation here looks like overkill for our case, so it's up to you to use this pattern.

go func() {
slog.Info("WebSocket server starting", "addr", server.Addr)
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
slog.Error("Server failed to start", "error", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListenAndServe() exits not only at startup, so this message can be a little confusing.

type conn struct {
*websocket.Conn
remoteAddr string
serverCloseCh chan struct{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] maybe Ch and serverClose are redundant according to Go naming guide. I think a simpler name goes well like interrupt.

https://google.github.io/styleguide/go/decisions#variable-name-vs-type
https://google.github.io/styleguide/go/decisions#external-context-vs-local-names

Comment on lines +203 to +206
defer func() error {
c.Close()
return connections.unregister(c.remoteAddr)
}()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think c.Close() and connections.unregister() should be moved to handleWebSocket() (just after creating/registering)
as error handling in defer is difficult.
https://tech.yappli.io/entry/understanding-defer-in-go

closeMessage := websocket.FormatCloseMessage(websocket.CloseNormalClosure, "from server")
if err := c.WriteControl(websocket.CloseMessage, closeMessage, time.Now().Add(5*time.Second)); err != nil {
slog.Error("failed to write close message", "error", err)
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think break is better here.

connections.register(r.RemoteAddr, conn)

if err := conn.handleWebsocketConnection(context.Background()); err != nil {
slog.Error("faile to handle websocket connectioin", "remote_addr", r.RemoteAddr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slog.Error("faile to handle websocket connectioin", "remote_addr", r.RemoteAddr)
slog.Error("failed to handle websocket connection", "remote_addr", r.RemoteAddr)

type conn struct {
*websocket.Conn
closeCh chan struct{}
closing atomic.Bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer need closing.


type conn struct {
*websocket.Conn
closeCh chan struct{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeCh doesn't need to be a member of conn.

}
u := url.URL{Scheme: "ws", Host: addr.AddrPort().String(), Path: "/ws"}

// ctx := context.Background()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage?

}
if metricsConfig.Export {
slog.Info("Start metrics server", "listen", m.AddrPort)
go func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as server-side.

}

func (c *conn) handleWebsocketConnection(ctx context.Context, config *Config, m *Metrics) error {
msgCh := make(chan message)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a name like pongReceived or pongMessage is easier to understand.

return nil
case <-c.ctrlC:
slog.Info("Interrupt received, closing connection", "remote_addr", c.RemoteAddr().String())
// acitve close sequence.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// acitve close sequence.
// active close sequence.

if retryCount >= config.MaxPingRetries {
// decide the connection is broken.
slog.Error("Reached to retry limit. The connection is broken.", "remote_addr", c.RemoteAddr().String())
c.closeCh <- struct{}{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.
(If we want graceful closing we need active close sequence here, but I think force closing is OK in this case)

m.incrementRetryCount()
slog.Debug("Sending ping message for retry", "retry", retryCount, "max-retry", config.MaxPingRetries, "remote_addr", c.RemoteAddr().String())
}
slog.Debug("Sending ping message", "remote_addr", c.RemoteAddr())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slog.Debug("Sending ping message", "remote_addr", c.RemoteAddr())
slog.Debug("Sending ping message", "remote_addr", c.RemoteAddr().String())

Comment thread websocket-keepalive/README.md Outdated
Comment on lines +59 to +62
- `established`
- Websocket で接続が確立していることを示す。確立していたら 1 を出力する。
- `ping_retry_count_total`
- Ping メッセージを再送した回数を示すカウンタ。
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be updated.

terassyi and others added 8 commits March 23, 2026 05:25
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
Signed-off-by: naoki-take <naoki-take@cybozu.co.jp>
Signed-off-by: terashima <tomoya-terashima@cybozu.co.jp>
@terassyi terassyi force-pushed the add-websocket-keepalive branch from 60715be to 2daede9 Compare March 23, 2026 05:27
Signed-off-by: terashima <iscale821@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants