From-scratch implementation of the MD5 algorithm in 5 different languages:
- Python
- C
- C++
- Rust
- Go
MD5 (Message-Digest Algorithm 5) is a hashing algorithm. It takes an input of arbitrary size and produces a 128-bit / 16-byte hash. It works in 4 steps:
-
Padding: The input data is extended so its total length is almost a multiple of 512 bits. A
1is added at the end, followed by0s, and the original length of the message is recorded in the last 64 bits -
Chunking: The padded data is split into 512-bit chunks
-
Processing: Each chunk goes through 64 rounds of calculations using simple math operations like addition and bit shifts. Four values (A-D) are updated in each round
-
Finalizing: The values of A, B, C, and D are combined to form the 128-bit hash result
MD5 should no longer be used in cryptographic applications due to its susceptibility to collision attacks, allowing malicious actors to craft different inputs resulting in identical hash values. More secure hash functions like SHA-256 have become dominant.
Since the md5 algorithm is a pure function, all implementations can be easily verified against each other. If one implementation produces a different result than the others, it can be assumed that its not correct.
Run the test script to verify this:
bash test.bash