DataForge is a modern C++20 header-only library for building declarative, composable data transformation pipelines.
It provides both push (output) and pull (input) iterator-based interfaces for applying arbitrary chains of conversions, including encoding, decoding, compression, encryption, hashing, and Unicode operations.
Transformations are described using quarks — small, composable objects that can be chained together with the | operator.
#include "dataforge/quark_push_iterator.hpp"
#include "dataforge/quark_pull_iterator.hpp"
#include "dataforge/base_xx/base64.hpp"
using namespace dataforge;
std::string input = "Hello, World!";
std::string base64_result;
// Create a pipeline: input bytes → Base64 encoding → output
auto push_it = quark_push_iterator(int8 | base64, std::back_inserter(base64_result));
*push_it = input;
push_it.finish();
std::cout << "Encoded: " << base64_result << std::endl; // Output: SGVsbG8sIFdvcmxkIQ==
// Reverse the process: Base64 → decoded bytes
std::string decoded_result;
auto pull_it = quark_pull_iterator(base64 | int8, base64_result);
for (auto span = *pull_it; !span.empty(); span = *++pull_it) {
std::copy(span.begin(), span.end(), std::back_inserter(decoded_result));
}
std::cout << "Decoded: " << decoded_result << std::endl; // Output: Hello, World!More complex pipelines can chain multiple transformations:
// Example: text → UTF-8 → compression → encryption → Base64
auto pipeline = utf8 | deflated() | aes(128, key) | base64;📁 See the examples/ folder for complete working examples including MD5 hashing, AES encryption, and more advanced use cases.
🧪 For comprehensive algorithm coverage and advanced pipeline patterns, explore the tests/ directory — it contains hundreds of real-world examples demonstrating every supported algorithm, from basic CRC checksums to complex multi-stage encryption pipelines.
DataForge combines multiple types of data transformations in one consistent framework, unlike other libraries that cover only subsets of functionality.
| Feature / Capability | DataForge | Crypto++ | Boost | ICU | range-v3 |
|---|---|---|---|---|---|
| Integer ↔ Bytes + Endian | ✅ | ❌ | ❌ | ❌ | ❌ |
| base16/32/58/64/ascii85/z85 | ✅ | ✅ | ❌ | ❌ | ❌ |
| Custom Base 1 < N < 256 | ✅ | ❌ | ❌ | ❌ | ❌ |
| Checksums (crc, adler, bsd) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Hashes (MD, SHA, Blake, etc) | ✅ | ✅ | ❌ | ❌ | ❌ |
| Encryption/Decryption | ✅ | ✅ | ❌ | ❌ | ❌ |
| Compression / Decompression | ✅ | ❌ | ❌ | ❌ | ❌ |
| Unicode Conversions (UTF) | ✅ | ❌ | ❌ | ✅ | ❌ |
| ICU Charset Conversions | ✅ | ❌ | ❌ | ✅ | ❌ |
| Grapheme Breaking | ✅ | ❌ | ❌ | ✅ | ❌ |
| Header-only | ✅ | ❌ | ✅ | ❌ | ✅ |
| Push/Pull iterator pipelines | ✅ | ❌ | ✅ (filters) | ❌ | ✅ |
Key point: DataForge allows chaining transformations like integer → endian → compression → encryption → base encoding in one declarative pipeline.
- Convert sequences of integers of various sizes to/from byte sequences.
- Configurable little-endian or big-endian representation.
- Base16, Base32, Base58, Base64, ASCII85, Z85.
- Arbitrary base conversion with
1 < N < 256and a custom alphabet — effectively a positional numeral system transformation.
- BSD checksum
- Adler32
- CRC8, CRC16, CRC32, CRC64
- MD2, MD4, MD5, MD6
- RIPEMD, Tiger
- SHA1, SHA2, SHA3
- Belt, GOST, Streebog, Whirlpool, Blake
- RC2, RC4, RC5, RC6
- DES, AES, Blowfish
- Belt, Magma
- Deflate
- Bzip2
- LZ4
- LZMA, LZMA2
(requires corresponding external libraries)
- UTF-7, UTF-8, UTF-16, UTF-32
- Any encoding supported by the ICU library
(requires ICU library)
- Splits a Unicode string into graphemes according to the Unicode Standard.
The library itself is header-only — nothing needs to be built for use in your projects.
However, the test suite depends on external libraries (zlib, icu, bzip2, lz4, liblzma, gtest), which are managed via vcpkg.
- Install vcpkg anywhere on your system (if not already installed).
- Set the environment variable
VCPKG_ROOTto the location of your vcpkg installation.- Example (Windows PowerShell):
setx VCPKG_ROOT "C:\dev\vcpkg"
- Example (Windows PowerShell):
- Open the Visual Studio solution for tests and build it.
- On the first build:
- The project will automatically:
- Check that
VCPKG_ROOTis set. - Run:
installing all required dependencies from
$(VCPKG_ROOT)\vcpkg.exe install
vcpkg.jsoninto a localvcpkg_installedfolder. - Configure
INCLUDEandLIBpaths to use these locally installed dependencies.
- Check that
- The project will automatically:
- On the first build:
- Run the tests from Visual Studio.
No global vcpkg integration (vcpkg integrate install) is required — everything is local to the repository.
Distributed under the Boost Software License, Version 1.0.
The Dataforge library is used in my iOS application on the App Store:
| PotoHEX HEX File Viewer & Editor | |
This application is designed to view and edit files at the byte or character level; calculate different hashes, encode/decode, and compress/decompress desired byte regions.
You can support my open-source development by trying the App.
Feedback is welcome!