Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
2c0b13e
Add websockets dependency to pyproject.toml
gbiavati Dec 2, 2025
9cd0cd4
Refactor WebSocket handling and client connection logic
gbiavati Dec 2, 2025
eb6d774
Refactor log streaming in WebSocket client handler to use the existin…
gbiavati Dec 2, 2025
de5964e
Update target directory in example usage and enhance logging in WebSo…
gbiavati Dec 2, 2025
8b364a1
Fix output file path assignment in MARS WebSocket handling and remove…
gbiavati Dec 2, 2025
9928175
Implement PTY for real-time log streaming in WebSocket server
gbiavati Dec 2, 2025
af2732a
Refactor WebSocket client and server to improve output handling and l…
gbiavati Dec 2, 2025
207d001
Enhance log streaming in WebSocket server by adding session managemen…
gbiavati Dec 2, 2025
76912a4
Improve log streaming error handling and enhance debug logging in Web…
gbiavati Dec 2, 2025
c2f8066
Add process monitoring to handle job completion in WebSocket server
gbiavati Dec 2, 2025
7e73398
Refactor logging configuration and enhance WebSocket error handling i…
gbiavati Dec 2, 2025
6a80d6b
Enhance WebSocket client and server with improved retry logic and hea…
gbiavati Dec 2, 2025
97f49a4
Add synchronous wrapper for WebSocket request handling
gbiavati Dec 2, 2025
4125c5f
Refactor WebSocket request handling to use 'target' instead of 'targe…
gbiavati Dec 2, 2025
431c846
Update WebSocket client and server to streamline return values and en…
gbiavati Dec 2, 2025
258b785
Refactor mars_via_ws_sync to use 'target' instead of 'target_dir' for…
gbiavati Dec 2, 2025
2a07d08
Refactor WebSocket server main function to improve structure and add …
gbiavati Dec 2, 2025
a0617c6
Refactor log file path in handle_client to use relative path for job …
gbiavati Dec 2, 2025
75ce8d8
Add optional logger parameter to mars_via_ws functions for improved l…
gbiavati Dec 2, 2025
9504e07
Refactor WebSocket client and server code for improved readability an…
gbiavati Jan 14, 2026
28c3c73
Refactor WebSocket client to enhance logging and error handling durin…
gbiavati Jan 25, 2026
52db36c
Enhance error logging in HTTP response handler for improved debugging
gbiavati Jan 25, 2026
32b60ac
Enhance logging in HTTP request handler to include UID for better tra…
gbiavati Jan 25, 2026
fec9e21
Update log message to specify "websocket-MARS" server connection
gbiavati Jan 26, 2026
522d52f
Merge remote-tracking branch 'origin/websocketmars' into websocketmars
gbiavati Jan 26, 2026
6ae90cb
Add comprehensive configuration support with YAML and environment var…
gbiavati Feb 8, 2026
0265321
Refactor handle_client to log request details earlier and improve rea…
gbiavati Feb 8, 2026
8658694
Enhance WebSocket client request validation and error handling in han…
gbiavati Feb 9, 2026
ef2af8f
Enhance output file handling in handle_client to ensure CephFS consis…
gbiavati Feb 9, 2026
56b5d67
Add support for maximum concurrent WebSocket connections and enhance …
gbiavati Feb 9, 2026
e8d7b77
Remove redundant filesystem sync in handle_client to optimize output …
gbiavati Feb 9, 2026
402ae7d
better handling of heartbeat
gbiavati Feb 9, 2026
35c35c4
Implement graceful shutdown handling and process cleanup in WebSocket…
gbiavati Feb 9, 2026
26c1166
Add CephFS health check functionality and logging for OSD/MDS issues
gbiavati Feb 9, 2026
d53e135
Add CephFS health check script and update project scripts
gbiavati Feb 9, 2026
7d0b980
Enhance CephFS health check documentation and logging
gbiavati Feb 9, 2026
d6443d5
Add log filtering functionality and custom log handler examples
gbiavati Feb 9, 2026
98eb4ce
Remove metrics export thread initialization from WebSocket server sta…
gbiavati Feb 9, 2026
c62c0b0
docs: Prepare release 3.0.0
gbiavati Feb 10, 2026
cf7a4f8
Enhance documentation and logging features
gbiavati Feb 10, 2026
17f7ad8
fix: Update import statements and exception handling in WebSocket cli…
gbiavati Feb 10, 2026
e06e81f
fix: Add type ignore comments for untyped imports in client, ws_clien…
gbiavati Feb 10, 2026
df7c40f
fix: Remove unused import of create_default_log_handler in combined_h…
gbiavati Feb 10, 2026
22d5b1e
fix: Refactor log messages and improve readability in WebSocket serve…
gbiavati Feb 10, 2026
1ba83f5
fix: Correct version number in changelog for WebSocket dependency req…
gbiavati Feb 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/on-push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -143,13 +143,13 @@ jobs:
integration-tests:
needs: [combine-environments, unit-tests]
if: |
success() && true
success() && false
runs-on: ubuntu-latest

strategy:
matrix:
include:
- python-version: '3.6'
- python-version: '3.11'
extra: -integration

steps:
Expand Down
149 changes: 149 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Changelog

All notable changes to cads-mars-server will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - 2026-02-10

### Added

#### WebSocket Client/Server Architecture

**Why WebSocket instead of HTTP?**

The new WebSocket-based architecture provides significant advantages over traditional HTTP request/polling patterns:

- **Real-time bidirectional communication**: Logs stream from server to client as they occur, eliminating polling overhead
- **Long-running request support**: Single persistent connection handles jobs that may run for minutes or hours
- **Efficient resource usage**: No repeated polling requests consuming server resources and network bandwidth
- **Interactive control**: Client can send commands (e.g., kill) to running jobs without establishing new connections
- **Lower latency**: Immediate notification of job completion, errors, or status changes
- **Simplified connection management**: Automatic reconnection and failover built into the protocol

Traditional HTTP polling would require:

- Periodic status check requests (wasted bandwidth, server load)
- Delayed notifications (polling interval limits responsiveness)
- Complex state management on server for status queries
- Additional API endpoints for job control

WebSocket provides a natural fit for the workflow: submit job → stream logs → receive result.

______________________________________________________________________

#### Core Features

- **WebSocket-based MARS client** (`ws_client.py`, `ws_server.py`) for shared filesystem deployments

- Asynchronous request handling with server-side job monitoring
- Connection pooling with automatic failover across multiple servers
- Real-time log streaming from MARS processes to clients
- Bidirectional communication for job control (kill, heartbeat)
- Configurable retry logic and connection timeouts

- **Modal client selection via `USE_SHARES` configuration**

- `MARS_USE_SHARES=false` (default): Traditional pipe-based client (fully backward compatible)
- `MARS_USE_SHARES=true`: WebSocket client for shared filesystem deployments
- Configuration via environment variable or YAML file (`/etc/cads-mars-server.yaml`)

- **Client-side log filtering** ([LOG_FILTERING.md](docs/LOG_FILTERING.md))

- Reduces noise from verbose MARS output
- Pattern-based filtering (errors, warnings, progress indicators)
- Message deduplication for repeated lines
- Injectable custom log handlers:
- Parse logs with custom logic
- Raise exceptions to abort requests on specific error conditions
- Send real-time commands to server (e.g., kill on timeout)
- Integrate with external monitoring systems
- Controlled via `CLIENT_FILTER_LOGS` config (default: enabled)

- **CephFS health diagnostics** ([CEPHFS_ARCHITECTURE.md](docs/CEPHFS_ARCHITECTURE.md))

- `check-cephfs-health` console script for diagnosing filesystem issues
- Documentation of CephFS architecture (MON/MDS/OSD components)
- Startup health checks with warnings for detected issues

### Fixed

- **Process group management for WebSocket server**

- Properly terminates entire process groups (parent + bash + MARS + children)
- Prevents orphaned processes during restarts or crashes
- Graceful shutdown with SIGTERM/SIGINT signal handlers
- Startup cleanup of orphaned processes from previous runs

- **WebSocket connection handling**

- Moved filesystem sync operations after connection close to prevent blocking
- Improved connection resource management during slow storage operations

### Changed

- **Configuration system**: Centralized in `config.py` with environment variable and YAML file support
- **Process title tracking**: Uses `setproctitle` for easier process identification and management

### Documentation

- **[README.md](README.md)**: Complete guide to both pipe and WebSocket modes with configuration examples
- **[LOG_FILTERING.md](docs/LOG_FILTERING.md)**: Client-side log filtering with custom handler examples
- **[CEPHFS_ARCHITECTURE.md](docs/CEPHFS_ARCHITECTURE.md)**: CephFS architecture and diagnostic guide

### Migration Guide

#### For Existing Deployments

**No action required** - The default behavior (`USE_SHARES=false`) maintains full backward compatibility with the existing pipe-based client. All current deployments will continue to work without any changes.

#### To Adopt WebSocket Mode

WebSocket mode requires:

- Shared filesystem accessible by both clients and servers
- WebSocket server(s) running on worker nodes
- Network connectivity between clients and servers

**1. Enable WebSocket client:**

```bash
# Environment variable
export MARS_USE_SHARES=true

# Or in /etc/cads-mars-server.yaml
use_shares: true
```

**2. Configure WebSocket server list:**

```bash
# Environment variable (comma-separated)
export MARS_WS_SERVERS="ws://worker1:9001,ws://worker2:9001"

# Or in configuration file
mars_ws_servers:
- ws://worker1:9001
- ws://worker2:9001
```

**3. Start WebSocket server on worker nodes:**

```bash
ws-mars-server --host 0.0.0.0 --port 9001
```

See [README.md](README.md) for complete deployment examples.

### Breaking Changes

- **Dependency version requirement**: Applications using `USE_SHARES=true` must use `cads-mars-server>=0.3.0`
- **Shared filesystem required**: WebSocket mode assumes client and server have access to the same filesystem paths
- **Custom log handlers**: Must be `async` functions (default filtering works without changes)

______________________________________________________________________

## [0.2.5.1] - Previous Release

(Earlier changes not documented)
Loading