Skip to content

NethmikaKekuu/Distributed-File-Sytem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 

Repository files navigation

Distributed File System with Raft

This project is a simple implementation of a Distributed File System (DFS) in Go. It uses the Raft consensus algorithm to replicate file metadata (creations, deletions, etc.) across a cluster of nodes, ensuring consistency and fault tolerance.

Unlike basic metadata-only systems, this implementation also replicates file content from the leader to all follower nodes, improving availability and durability.

The Raft layer is based on a modified version of Phil Eaton's goraft.


Key Features

Core Functionality

  • Raft-based Metadata Replication: File operations (create, delete) are committed to a distributed log via Raft, ensuring metadata is strongly consistent across the cluster.

  • Leader Election: Nodes elect a leader automatically. All write operations go through the leader, while reads can be served from any replica.

  • File Content Replication: When a file is uploaded, the leader replicates its content to all followers. This ensures that file data is not lost if the leader crashes.

  • Basic Fault Tolerance: The system tolerates up to (N-1)/2 failures in a cluster of size N. Both metadata and file content remain available as long as a quorum survives.

Advanced Fault Tolerance Features

  • Automatic Recovery Manager: Detects and recovers missing files by fetching them from healthy nodes (runs every 30 seconds)

  • Health Monitoring: Continuously monitors the health of all cluster nodes with configurable failure thresholds (checks every 5 seconds)

  • Data Repair & Integrity: Uses SHA-256 checksums to detect corrupted files and automatically repairs them from healthy replicas (verifies every 60 seconds)

  • Performance Metrics: Tracks operation latency, success rates, replication overhead, and provides trade-off analysis between reliability and performance

  • NTP-Based Time Synchronization: The system was upgraded to use the Network Time Protocol (NTP) for time synchronization.

    • Previous Method (Removed): Follower nodes would ask the Raft leader for the current time. This approach was simple but had a major drawback: if the leader's clock was inaccurate, the entire cluster's time would be skewed.
    • Current Method: Each node now independently contacts an external NTP server (pool.ntp.org) to synchronize its clock. This removes the dependency on the leader, making timestamps significantly more accurate and robust across the entire cluster.

Simple HTTP API

Interact with the DFS using REST-like endpoints for upload, download, delete, list, and status checks.


Architecture Overview

graph TD
    A[Client] --> B(Leader Node);
    B --> C(Follower Node 1);
    B --> D(Follower Node 2);

    subgraph Cluster
        B((Leader Node<br/>Raft + Files))
        C((Follower Node 1<br/>Raft + Files))
        D((Follower Node 2<br/>Raft + Files))
    end

    style B fill:#f9f,stroke:#333,stroke-width:2px;
    style C fill:#ccf,stroke:#333;
    style D fill:#ccf,stroke:#333;

    B -- Metadata: Raft Log --> C;
    B -- File Content: Replication --> C;
    B -- Metadata: Raft Log --> D;
    B -- File Content: Replication --> D;
Loading

Metadata: Replicated via Raft log
File Content: Replicated from Leader to Followers
Reads: Can be served from any node


Prerequisites

  • Go: Version 1.20+
  • PowerShell: To run the cluster startup script on Windows
  • curl: For quick testing of the HTTP API

Running the Cluster

1. Build the Application

go build -o dfsapi.exe .

2. Start the Cluster(This is a Shell Script)

.\start-cluster.ps1

This launches 3 nodes:

  • Node 1: http://localhost:8081 (Raft RPC on :3030)
  • Node 2: http://localhost:8082 (Raft RPC on :3031)
  • Node 3: http://localhost:8083 (Raft RPC on :3032)

Testing the File System

Basic Operations

1. Find the Leader

curl http://localhost:8081/status

Look for "is_leader": true.

2. Upload a File

echo "hello distributed world" > my-test-file.txt
curl -X POST --data-binary @my-test-file.txt http://localhost:8081/upload/my-first-file.txt

3. List Files (from any node)

curl http://localhost:8082/files

4. Download from a Follower

curl http://localhost:8083/upload/my-first-file.txt

5. Delete a File

curl -X DELETE http://localhost:8081/upload/my-first-file.txt

Fault Tolerance Monitoring

Check Recovery Status

curl http://localhost:8081/recovery/status

Shows total files, local files, and missing files that need recovery.

Check Health Status

curl http://localhost:8081/health/status

Displays health status of all cluster nodes with response times and failure counts.

Check Data Repair Status

curl http://localhost:8081/repair/status

Shows verified files, repairs performed, and corruptions detected.

View Performance Metrics

curl http://localhost:8081/metrics

Displays operation counts, average durations, success rates, and replication statistics.

Analyze Trade-offs

curl http://localhost:8081/metrics/tradeoffs

Provides analysis of reliability vs. performance trade-offs in the current configuration.

Manual File Verification

curl -X POST -H "Content-Type: application/json" \
  -d '{"file_path":"my-first-file.txt"}' \
  http://localhost:8081/repair/verify

Trigger Manual Sync

curl -X POST http://localhost:8081/recovery/sync

API Endpoints

File Operations

  • POST /upload/{filename} - Upload a file (leader only)
  • GET /upload/{filename} - Download a file (any node)
  • DELETE /upload/{filename} - Delete a file (leader only)
  • GET /files - List all files (any node)

System Status

  • GET /status - Node status and leader information

Fault Tolerance

  • GET /recovery/status - Recovery manager status
  • POST /recovery/sync - Trigger manual recovery sync
  • GET /health/status - Cluster health monitoring
  • GET /repair/status - Data repair and integrity status
  • POST /repair/verify - Manually verify a specific file
  • GET /metrics - Performance metrics
  • GET /metrics/tradeoffs - Trade-off analysis

Internal (used by cluster)

  • POST /replicate/{filename} - Receive replicated file content
  • POST /delete-content/{filename} - Delete local file content

Fault Tolerance Details

Recovery Manager

  • Runs every 30 seconds
  • Compares local files with cluster metadata
  • Automatically fetches missing files from healthy nodes
  • Ensures eventual consistency of file content

Health Monitor

  • Performs health checks every 5 seconds
  • Marks nodes unhealthy after 3 consecutive failures
  • Provides real-time cluster health visibility
  • Helps identify network partitions or node failures

Data Repair Manager

  • Calculates SHA-256 checksums for all files
  • Verifies file integrity every 60 seconds
  • Detects silent data corruption
  • Automatically repairs corrupted files from healthy replicas

Metrics Collector

  • Tracks all file operations (create, read, delete)
  • Measures latency, throughput, and success rates
  • Analyzes replication overhead
  • Provides insights on reliability vs. performance trade-offs
  • Logs summary every 5 minutes

Development Challenges Faced

Building this distributed file system involved overcoming several common but complex engineering challenges:

  1. Concurrency and State Management:

    • Challenge: Ensuring that shared data (like the file list or performance metrics) could be safely accessed by concurrent HTTP requests, background Raft processes, and timed fault-tolerance checks without causing race conditions.
    • Example: The initial implementation for tracking the success/failure of a file upload required a complex wrapper around Go's standard http.ResponseWriter just to capture the status code from a concurrent handler. This highlights the difficulty of managing state across asynchronous operations.
  2. Correctly Implementing Distributed Concepts:

    • Challenge: The theoretical concepts of distributed systems have subtle but critical details. The initial implementation of time synchronization is a prime example.
    • Example: The system first used a simple, leader-based time sync where followers would ask the leader for the time. The difficulty was realizing this approach was flawed—if the leader's clock was wrong, the entire cluster's time would be skewed. This led to its replacement with a more robust NTP-based solution, where each node consults an external, authoritative time source.
  3. Robust Configuration and Startup:

    • Challenge: Parsing command-line arguments manually is error-prone. The initial getConfig function had to carefully iterate through os.Args to find flags and their values.
    • Example: This manual approach is brittle; it can easily break if arguments are provided in an unexpected order. While functional, it represents the common difficulty of building robust configuration handling without relying on standard libraries like Go's flag package.

Project Structure

.
├── consensus/          # Raft consensus implementation
│   ├── raft.go        # Core Raft logic
│   ├── raft_test.go   # Persistence tests
│   └── util.go        # Utility functions
├── replication/       # File replication logic
│   └── replication.go # State machine and handlers
├── faulttolerance/    # Fault tolerance features
│   ├── recovery.go    # Recovery manager
│   ├── health.go      # Health monitoring
│   ├── datarepair.go  # Data integrity verification
│   ├── metrics.go     # Performance metrics
│   └── faulttolerance.go # Delete operations
├── timesync/          # Time synchronization
│   └── timesync.go    # Clock synchronization
├── main.go            # Entry point and HTTP server
├── start-cluster.ps1  # Cluster startup script
└── README.md          # This file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors