A robust command-line tool for monitoring and verifying file integrity using cryptographic hashing. This security tool helps detect unauthorized modifications to files by maintaining and comparing SHA-256 hashes.
- Learn about cryptographic hashing and file integrity verification
- Implement secure file monitoring system
- Understand CLI application development in Python
- Practice secure coding and data storage
This project is based on the File Integrity Checker from roadmap.sh's project collection.
- Python: Core programming language
- Click: CLI framework for command handling
- PyYAML: YAML-based hash storage
- hashlib: Cryptographic hash functions
- pathlib: File system operations
- File & Directory Support: Process individual files or entire directories
- SHA-256 Hashing: Industry-standard cryptographic hash function
- Secure Storage: YAML-based persistent hash storage
- Change Detection: Accurate identification of file modifications
- Clear Reporting: User-friendly status messages
- Manual Updates: Support for legitimate file changes
flowchart TD
subgraph User["User Interface"]
A[User Input] --> B[CLI Interface]
B --> C{Command Router}
end
subgraph Commands["Command Processing"]
C -->|"init"| D[Hash Calculator]
C -->|"check"| E[Hash Verifier]
C -->|"update"| F[Hash Updater]
end
subgraph Storage["Data Management"]
G[(Hash Storage)]
H[YAML File<br/>.file_hashes.yml]
end
subgraph FileOps["File Operations"]
I[File Reader] --> J[Binary Processing]
J --> K[Chunk Handler<br/>4KB blocks]
end
D --> I
E --> I
F --> I
K --> D
K --> E
K --> F
D --> G
E --> G
F --> G
G <--> H
style User fill:#f9f,stroke:#333,stroke-width:2px
style Commands fill:#bbf,stroke:#333,stroke-width:2px
style Storage fill:#bfb,stroke:#333,stroke-width:2px
style FileOps fill:#fbb,stroke:#333,stroke-width:2px
Key Components:
-
User Interface Layer:
- Handles user input through CLI
- Parses commands using Click framework
- Routes commands to appropriate handlers
-
Command Processing Layer:
- Init: Creates initial file hashes
- Check: Verifies file integrity
- Update: Refreshes stored hashes
-
File Operations Layer:
- Reads files in binary mode
- Processes in 4KB chunks
- Handles large files efficiently
-
Data Management Layer:
- Stores hashes in YAML format
- Manages hash retrieval
- Ensures data persistence
Data Flow:
- User enters command through CLI
- Command router directs to appropriate handler
- File operations read and process files
- Hash calculations performed on file chunks
- Results stored in or compared with YAML storage
- Status reported back to user
- Python 3.8 or higher
- pip (Python package manager)
- Git (optional, for cloning)
- Clone the repository:
git clone https://github.com/kaalpanikh/file-integrity-checker.git
cd file-integrity-checker- Install dependencies:
pip install -r requirements.txt- Make the script executable (Unix-like systems):
chmod +x integrity-check# Initialize file/directory hashes
./integrity-check init <path>
# Check file/directory integrity
./integrity-check check <path>
# Update file hash after legitimate changes
./integrity-check update <path># Initialize a directory
./integrity-check init /var/log
> Hashes stored successfully.
# Check a specific file
./integrity-check check /var/log/syslog
> Status: Modified (Hash mismatch)
# Update after legitimate changes
./integrity-check update /var/log/syslog
> Hash updated successfully.file-integrity-checker/
βββ integrity_checker.py # Main implementation
βββ integrity-check # Executable script
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
βββ DEVELOPMENT_GUIDE.md # Detailed development guide
βββ .file_hashes.yml # Hash storage (created on first run)
-
Cryptographic Security
- SHA-256 hashing algorithm
- Collision resistance
- Pre-image resistance
-
Safe File Operations
- Binary mode file reading
- Chunk-based processing
- Path validation
-
Secure Storage
- Local YAML storage
- No network transmission
- Clear error reporting
-
Cryptography
- Hash function properties
- Data integrity verification
- Security considerations
-
Python Development
- CLI application creation
- File system operations
- Type hints and documentation
-
Best Practices
- Modular code organization
- Error handling
- Security-first thinking
-
Setup Environment
- Install Python 3.8+
- Install dependencies
- Configure development tools
-
Implementation
- Core hash functions
- CLI commands
- Storage management
-
Testing
- Functionality testing
- Edge case handling
- Security validation
Contributions are welcome! Please feel free to submit pull requests.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Nikhil Mishra (@kaalpanikh)
- roadmap.sh for the project idea and requirements
- Python community for excellent libraries and tools
- Contributors and users of this project
# Create a test file
$ echo "This is a test file to demonstrate the file integrity checker." > test.txt
# Initialize the hash
$ python integrity_checker.py init test.txt
> Hashes stored successfully.
# Verify the file (unmodified)
$ python integrity_checker.py check test.txt
> Status: UnmodifiedLet's examine our .file_hashes.yml file after initialization:
# Content of .file_hashes.yml
test.txt: "d5579c46dfcc7f18207013e65b44e4cb4e2c2298f4ac457ba8f82743f31e930b"This shows:
- Relative path storage for portability
- Full SHA-256 hash (64 characters)
- YAML format for readability
file-integrity-checker/
βββ integrity_checker.py # Main implementation
βββ integrity-check # Executable script
βββ requirements.txt # Dependencies
βββ README.md # Documentation
βββ DEVELOPMENT_GUIDE.md # Development details
βββ FIRST_PRINCIPLES.md # Beginner's guide
βββ LICENSE # MIT License
βββ .file_hashes.yml # Hash storage- Dependencies
# From requirements.txt
click==8.1.7 # CLI framework
pathlib==1.0.1 # File operations
PyYAML==6.0.1 # Hash storage- Core Hash Function
def calculate_file_hash(file_path: str) -> str:
sha256_hash = hashlib.sha256()
with open(file_path, "rb") as f:
for byte_block in iter(lambda: f.read(4096), b""):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()-
Cryptographic Security
- SHA-256 hashing algorithm
- 256-bit (32-byte) hash length
- Collision-resistant design
-
File Handling
- Binary mode file reading
- Chunk-based processing (4KB blocks)
- Safe file path handling
-
Data Storage
- Human-readable YAML format
- Local file system storage
- Path-based organization
-
β File Operations
- Create and read files
- Process binary content
- Handle various file sizes
-
β Hash Management
- Generate SHA-256 hashes
- Store hashes securely
- Compare hash values
-
β User Interface
- Clear command structure
- Informative status messages
- Error handling
-
β Documentation
- Comprehensive README
- Development guide
- First principles explanation
These test results demonstrate that our implementation successfully meets all project requirements from roadmap.sh while maintaining security and usability.