Skip to content

New Cluster Nodes Unable to Contact Webseed in TLS-Enabled Cluster Setup #964

@abhishek-das-gupta

Description

@abhishek-das-gupta

Overview

Adding new hosts within a cluster with TLS enabled is problematic due to a prerequisite that new nodes should have a 14 GB file distributed using the BitTorrent client running on these hosts. This torrent process is stuck indefinitely.

Architecture

Cluster Architecture

Within our cluster, we have a master node and worker nodes that report the cluster's state to the master. The master generates the .torrent file, which is a trackerless torrent file. The master somewhat acts as a tracker, providing each peer with information about other peers to communicate with during torrenting.

Torrent Architecture

Torrent Process During Fresh Cluster Install

This is the process followed during a fresh cluster setup:

  • The master node first downloads (HTTP fetch) the parcel from a remote server, then acts as a web seed.
  • Each peer gets other peers' information (host IP, port) using the heartbeat response from the master node. Each peer then calls the AddPeers() API from the torrent client. This AddPeers() API call happens after every heartbeat response from the master to the worker peer.
  • Torrenting starts between the peers (master + workers).
  • There is fallback logic that if the parcel download doesn't complete via BitTorrent in a certain time, the fallback mechanism is to do an HTTP download from the web seed.
Torrent Process During New Host Addition in Existing Cluster

This is the general flow of how new hosts are added in an existing cluster:

  • A set of new hosts getting added to the cluster install the Anacrolix/torrent binary and the libtorrent binary. By default, the Anacrolix/torrent client process runs.
  • These new peers/hosts/nodes contact the web seed (master node) present in the existing cluster (TLS enabled or not) to download the 14 GB file.
  • Simultaneously, these new peers start distributing pieces of this 14 GB file among each other.

Scenarios with New Host(s) Addition

Without TLS Enabled on the Existing Cluster

  • Whether using Anacrolix/Torrent or the libtorrent enabled on the new hosts being added, the 14 GB file gets distributed quickly, and these new hosts are added to the cluster.
  • If Anacrolix/Torrent is used, during torrenting of the 14 GB parcel, these peers have web seed information in their statuses:
webseeds:
- CLOSED: http://ccycloud-1.b-135-no-tls.root.comops.site:7180/cmf/parcel/download/CDH-7.2.18-1.cdh7.2.18.p0.51297892-el8.parcel
  last unhandled error: never
  bep40-prio: e97fd7f2
  last msg: never, connected: never, last helpful: 147.05s ago, itime: 2m41.004987105s, etime: 13.875250838s
  1669/1669 completed, 0 pieces touched, good chunks: 40889/40889:0 reqq: 0+0/(84/128):0/1024, flags: i:WS:, dr: 47132.0 KiB/s
  requested pieces:

With TLS Enabled in the Existing Cluster

Case #1: Libtorrent Client Process Runs on the New Hosts

The 14 GB file gets distributed within a few minutes.

Case #2: Anacrolix/Torrent Process Runs on the New Hosts

The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:

webseeds:  <--- no web seed
2 peer conns:
- 10.140.93.137:51680-10.140.40.8:7191
  peer id: "-GT0003-\xb3.\x9epQ\xd6LG\x03\xad\xce8"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: e8a31f71
  last msg: 26.36s ago, connected: 86.37s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :M,e,v1:, dr: 0.0 KiB/s
  requested pieces:
- 10.140.93.137:7191-10.140.24.8:43468
  peer id: "-GT0003-\xfc\x93{w:\x94~\x8f\x13\u0671\x1b"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: d766eef0
  last msg: 86.29s ago, connected: 86.29s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :I,e,v1:, dr: 0.0 KiB/s
  requested pieces:

Hi @anacrolix, Can you please provide pointers on why this API: http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download> is not reachable from peer to the web seed present?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions