Encrypting Reverse proxy for Google Cloud Storage.
This project provides an encrypting reverse proxy for Google Cloud Storage (GCS). It is designed to intercept and modify HTTP/HTTPS traffic for GCS operations, encrypting data before upload and decrypting it after download. This adds an additional layer of security to the existing GCS service-side encryption offerings. This is especially useful for organizations with strict security and privacy requirements, such as those who want to prevent even Google from having access to their data.
- Transparent Encryption/Decryption: Ensures that data uploaded to GCS is automatically encrypted using Google Cloud KMS and Tink, while downloaded data is seamlessly decrypted.
- Man-in-the-Middle (MITM) Proxy: Employs MITM proxy to intercept and modify HTTP/HTTPS traffic for GCS operations.
- Tink Library: Leverages the Tink library for robust cryptographic operations and secure key management.
- Easy-to-Use: Works out of the box with
gsutilandgcloudcommands,axlearnandtensorflowlibraries. Requires no complex configurations. - Key Management:
- Uses GCP KMS for key management.
- Allows for specifying an encryption key per bucket using key-value pairs.
- Compliance:
- Employs only approved algorithms (SHA, AES, RSA, ECDSA) with appropriate bit sizes (SHA-256, RSA-2048, ECDSA-256).
- Scalability: Designed to be scalable and work behind a load balancer.
- Deployment: Can be deployed as a sidecar deployment.
- Logging:
- Safe logging practices prevent leaks of keys or data.
- Configurable logging levels (debug, error, warning, info, etc.).
- If you are targeting Axlearn or Tensorflow, follow the instructions to patch go
- Build the
go-gcsproxybinary:make
- Configure the proxy's behavior through environment variables. For a
comprehensive list of available options, refer to the
Makefile. - Run the proxy:
./go-gcsproxy -debug=1 \ -kms_resource_name=projects/YOUR_PROJECT_ID/locations/global/keyRings/YOUR_KEYRING/cryptoKeys/YOUR_CRYPTO_KEY -cert_path=/your/path/to/certs # mitmproxy-ca.pem is automatically generated on first run of proxy - (optional) configure environment variables for
GCP_KMS_RESOURCE_NAME, PROXY_CERT_PATH, SSL_INSECURE, DEBUG_LEVEL, GCP_KMS_BUCKET_KEY_MAPPING
Use the follwing docker command to build the docker image:
docker build --platform=linux/amd64 -f ./Dockerfile.go123patch -t go-gcsproxy .
To run the docker image:
-
Create Google application default credentials
gcloud application-default login -
Create an env file like the following:
GCP_KMS_RESOURCE_NAME=projects/<your-project>/locations/global/keyRings/<your-key-ring>/cryptoKeys/<your-key>
PROXY_CERT_PATH=<your-path-to-cert>
DEBUG_LEVEL=1
GOOGLE_APPLICATION_CREDENTIALS=<your-path-to-adc>
- Run the container
docker run -it -v ${HOME}/.config/gcloud:<your-path-to-adc-from-env-file> -v ${HOME}/<path-to-cert>:<your-path-to-cert-from-env-file> --env-file <your-env-file-from-step2> -p 9080:9080 go-gcsproxy
To use gsutil or gcloud with the go-gcsproxy, you need to configure them to
use the proxy and trust the proxy's CA certificate.
You could also use funtional testing to test the proxy.
Set the following environment variables to direct gcloud traffic
through the proxy:
export https_proxy=http://127.0.0.1:9080
export http_proxy=http://127.0.0.1:9080
export HTTPS_PROXY=http://127.0.0.1:9080
export REQUESTS_CA_BUNDLE=/your/path/to/certs/mitmproxy-ca.pem
gcloud config set custom_ca_certs_file $REQUESTS_CA_BUNDLEBy default, every request to GCS will be encryted including requests to public datasets.
This optional feature allows for more granular control over encryption keys by enabling the specification of different KMS keys for different GCS paths (buckets or sub-paths within buckets).
The GCP_KMS_BUCKET_KEY_MAPPING parameter (or -gcp_kms_bucket_key_mappings command-line flag) accepts a key-value encoded string to map GCS paths to KMS keys.
Buckets not listed in GCP_KMS_BUCKET_KEY_MAPPING will pass-thru to GCS unencrypted
Example:
GCP_KMS_BUCKET_KEY_MAPPING="bucket1:projects/project1/locations/global/keyRings/keyring1/cryptoKeys/key1,bucket2/path/to/data:projects/project2/locations/global/keyRings/keyring2/cryptoKeys/key2"
This example maps bucket1 to key1 and bucket2/path/to/data to key2.
- Functional Testing -- A set of testings for various GCS clients(i.e. tf.io) besides
gcloudandgsutil. - Performance Testing -- Benchmarking with various profiles based on CPU/MEM, load, and file size.
- P0 (MVP):
- Meet core requirements including basic encryption/decryption, key
management, and compatibility with common tools like
gsutilandgcloud. - Internal Google testing.
- Meet core requirements including basic encryption/decryption, key
management, and compatibility with common tools like
- P1:
- Enhanced deployment options (sidecar in GKE, controller-managed annotation).
- Support for JSON and gRPC APIs.
- Dynamic configuration updates.
- P2:
- Integration with GCSFUSE.
- Potential performance optimizations for TPUs.
- Terraform deployment templates for non-GKE deployments.
- Dependencies:
- Tink library
- Risks and Mitigations:
- Potential slow adoption due to organizational factors.
- Support and Tools:
- Best-effort support.
- Uploads are currently limited to 100MB in
gcloud. - Streaming uploads are not fully supported.
- Resumable uploads are not fully supported.
These limitations will be addressed by the upcoming feature request for streaming uploads.
This section outlines the proposed algorithm for supporting streaming uploads to
GCS via the go-gcsproxy, enhancing its functionality and compatibility with
various data transfer scenarios.
GCS handles streaming uploads through a series of distinct requests:
- Initiate Upload: A POST request to the bucket path initiates the upload and returns a unique upload ID.
- Upload Chunks: Subsequent PUT requests send individual chunks of data. Each request includes the upload ID, the byte range of the chunk being uploaded, and the chunk data itself. GCS appends each chunk to the composite object in the order received.
- Finalize Upload: (Implicit) Once all chunks are uploaded, GCS automatically finalizes the composite object.
Currently, the proxy handles the initial POST and the first PUT request. This enhancement focuses on enabling the proxy to handle the subsequent PUT requests for the remaining chunks, providing comprehensive support for streaming uploads.
- Each chunk, corresponding to a single PUT request from the client, is encrypted independently by the proxy before upload.
- The encrypted length of each chunk is stored as custom metadata in the final composite object to facilitate accurate decryption during download.
- Metadata format:
x-chunk-len-1: <length of chunk 1> x-chunk-len-2: <length of chunk 2> x-chunk-len-3: <length of chunk 3> ... x-unencrypted-length: ... x-md5-hash: ...
- Persistent Cache Update: The proxy's local cache, storing bucket information by upload ID, is extended to store encrypted offsets of each chunk associated with that ID, enabling the proxy to track the composite object's structure.
- Metadata Update Trigger: The crucial metadata update, containing the
x-chunk-len-entries, occurs only after the final chunk is processed and the composite object exists. The proxy is modified to detect the final chunk upload response from GCS, triggering the metadata update.
- Download and Buffer: The entire object is downloaded and buffered.
- Chunk Identification: Custom metadata is parsed to determine the starting offset and length of each encrypted chunk.
- Decryption and Concatenation: Each chunk is decrypted independently, and the decrypted chunks are concatenated to reconstruct the original file.
- Data Return: The final, decrypted object is returned to the client.