-
-
Notifications
You must be signed in to change notification settings - Fork 110
Implementation of batch processing #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tgvashworth
wants to merge
22
commits into
main
Choose a base branch
from
impl/batch-processing-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
c568142
Initial commit of IMPLEMENTATION.md
tgvashworth 7ee244a
Initial commit of code structure
tgvashworth 1ba385e
Image downloads working
tgvashworth aa06bd5
Add imagemagick with build, run & test in container
tgvashworth d6aaec5
Add make develop for developing in Docker
tgvashworth 7f94598
Small bash history hack
tgvashworth 5a0d78f
Convert images to grayscale
tgvashworth 5cdbd0e
Add a note about developing locally
tgvashworth 281ee4d
Initial commit of uploading to S3
tgvashworth d4d08b2
Test and fix bad HTTP status codes
tgvashworth bc89b11
Small implementation note tweaks
tgvashworth 4b2141c
Write output to CSV
tgvashworth 137e059
Handle partial failure
tgvashworth 37c05c3
Fix run target entrypoint
tgvashworth 71a8f79
Basic tests for loading rows
tgvashworth 418b0bd
Add outputs to scripts and mount output dir
tgvashworth ee37aac
Small fixes
tgvashworth 2bc88ed
URN -> ARN
tgvashworth 6e10f3e
Use -out suffix on output files
tgvashworth 72b2b7f
Drop the upload context
tgvashworth 022f41a
Extract core processing loop to function
tgvashworth e205c2e
Clean up dockerfile
tgvashworth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| name: batch-processing tests | ||
| on: [push] | ||
| defaults: | ||
| run: | ||
| working-directory: batch-processing | ||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v3 | ||
| - name: Set up Go | ||
| uses: actions/setup-go@v3 | ||
| with: | ||
| go-version-file: "batch-processing/go.mod" | ||
| cache: false | ||
| - name: Test | ||
| run: make test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| go run . --input /inputs/example.csv --output /outputs/example-result.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| docker_env |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # syntax=docker/dockerfile:1 | ||
| ## | ||
| ## BASE | ||
| ## | ||
| FROM golang:1.19-bullseye as base | ||
|
|
||
| # Ignore APT warnings about not having a TTY | ||
| ENV DEBIAN_FRONTEND noninteractive | ||
|
|
||
| # install build essentials | ||
| RUN apt-get update && \ | ||
| apt-get install -y wget build-essential pkg-config --no-install-recommends | ||
|
|
||
| # Install ImageMagick deps | ||
| RUN apt-get -q -y install libjpeg-dev libpng-dev libtiff-dev \ | ||
| libgif-dev libx11-dev --no-install-recommends | ||
|
|
||
| ENV IMAGEMAGICK_VERSION=6.9.10-11 | ||
|
|
||
| # Install ImageMagick | ||
| RUN cd && \ | ||
| wget https://github.com/ImageMagick/ImageMagick6/archive/${IMAGEMAGICK_VERSION}.tar.gz && \ | ||
| tar xvzf ${IMAGEMAGICK_VERSION}.tar.gz && \ | ||
| cd ImageMagick* && \ | ||
| ./configure \ | ||
| --without-magick-plus-plus \ | ||
| --without-perl \ | ||
| --disable-openmp \ | ||
| --with-gvc=no \ | ||
| --disable-docs && \ | ||
| make -j$(nproc) && make install && \ | ||
| ldconfig /usr/local/lib | ||
|
|
||
| # Build the app | ||
| WORKDIR /app | ||
|
|
||
| COPY go.mod ./ | ||
| COPY go.sum ./ | ||
|
|
||
| RUN go mod download | ||
|
|
||
| COPY *.go ./ | ||
| RUN mkdir -p /inputs /outputs | ||
|
|
||
| # This is required for test and run, but for develop it ensures we have a build cache | ||
| RUN go build -o /out | ||
|
|
||
| # Set up environment | ||
| ENV AWS_REGION "" | ||
| ENV AWS_ROLE_ARN "" | ||
| ENV S3_BUCKET "" | ||
|
|
||
| ## | ||
| ## TEST | ||
| ## | ||
| FROM base as test | ||
|
|
||
| ENTRYPOINT [ "go", "test", "-v" ] | ||
|
|
||
| ## | ||
| ## DEVELOP | ||
| ## | ||
| FROM base as develop | ||
|
|
||
| RUN mkdir -p /root/.aws /root/.cache/go-build | ||
| COPY .bash_history /root/.bash_history | ||
| ENTRYPOINT [ "/bin/bash" ] | ||
|
|
||
| ## | ||
| ## RUN | ||
| ## | ||
| FROM base as run | ||
|
|
||
| WORKDIR / | ||
| ENTRYPOINT ["/out"] | ||
| CMD ["--input", "/inputs/example.csv", "--output", "/outputs/example-result.csv"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| # batch-processing | ||
|
|
||
| See https://github.com/CodeYourFuture/immersive-go-course/issues/26 for context. | ||
|
|
||
| ## Plan | ||
|
|
||
| The planned architecture of this: | ||
|
|
||
| 1. Read the CSV | ||
| 2. Download the images to a location (`/tmp`) | ||
| 3. Use imagemagick to monochrome them | ||
| 4. Upload them to S3 | ||
| 5. Return the URL | ||
|
|
||
| I tried [an initial implementation](https://github.com/CodeYourFuture/immersive-go-course/pull/46) of this that went a long way, but that I didn't like in the end. | ||
|
|
||
| The first step will be to build this linearly, and to write tests as we go. Because there is real file getting and writing, we will run integration tests in Docker: | ||
|
|
||
| 1. Mock the `jpg` get | ||
| 2. Write a real file | ||
| 3. Mock S3 methods using [s3iface](https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3iface/) | ||
|
|
||
| Then use goroutines to run it in parallel, likely by wrapping the output in a mutex and locking/unlocking as the goroutine completes: https://pkg.go.dev/sync#Mutex | ||
|
|
||
| A possible last extension would be to use channels: https://go.dev/blog/pipelines | ||
|
|
||
| ## Downloads | ||
|
|
||
| The download is simple — create a file in a temporary location, and `http.Get` into it with `io.Copy`. | ||
|
|
||
| ## `imagemagick` | ||
|
|
||
| To run ImageMagick (and this whole thing) in a repeatable way, we will do it all in a Docker container based on `dpokidov/imagemagick:latest-bullseye` using multi-stage build. This will give us the `magick` command. | ||
|
|
||
| To be able to run the tests and the app, we end up with multiple targets: | ||
|
|
||
| ```Dockerfile | ||
| FROM golang:1.19-bullseye as base | ||
|
|
||
| # ... install dependencies & build ... | ||
|
|
||
| FROM base as test | ||
|
|
||
| # ... run tests ... | ||
|
|
||
| FROM base as run | ||
|
|
||
| # ... run app ... | ||
| ``` | ||
|
|
||
| Which can then be built by specifying the `--target`: | ||
|
|
||
| ```console | ||
| > docker build --target test -t test . | ||
| ``` | ||
|
|
||
| ## Grayscale | ||
|
|
||
| `convert`, accessed via `ConvertImageCommand`, with `-set colorspace Gray -separate -average` seems to work well. | ||
|
|
||
| ## Developing | ||
|
|
||
| We can run locally. A few things are needed. | ||
|
|
||
| In VSCode settings, if using the go extension: | ||
|
|
||
| ```json | ||
| "gopls": { | ||
| "build.env": { | ||
| "CGO_CFLAGS_ALLOW": "-Xpreprocessor" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| On the CLI: | ||
|
|
||
| ```console | ||
| export PKG_CONFIG_PATH="/usr/local/opt/imagemagick@6/lib/pkgconfig" | ||
| ``` | ||
|
|
||
| ### Developing in Docker | ||
|
|
||
| To develop the app with Docker, we need a slightly fancier command: | ||
|
|
||
| ```Makefile | ||
| develop: | ||
| mkdir -p mount | ||
| docker build --target develop -t develop . | ||
| docker run -it --mount type=bind,source="$$(pwd)",target=/app --mount type=bind,source="/tmp",target=/tmp --rm develop | ||
tgvashworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rm -rf ./mount | ||
| ``` | ||
|
|
||
| ## Upload to S3 | ||
|
|
||
| - Get credentials set up — https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html | ||
| - `brew install awscli` | ||
| - `aws configure` | ||
|
|
||
| Follow upload example here: `https://github.com/aws/aws-sdk-go` | ||
|
|
||
| We need to mount creds from host: `--mount type=bind,source="$$(echo $$HOME)/.aws",target=/root/.aws` | ||
|
|
||
| Create `S3ReadWriteGoCourse` policy for IAM role: | ||
|
|
||
| ```json | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Sid": "ListObjectsInBucket", | ||
| "Effect": "Allow", | ||
| "Action": ["s3:ListBucket"], | ||
| "Resource": ["arn:aws:s3:::[ID]"] | ||
| }, | ||
| { | ||
| "Sid": "AllObjectActions", | ||
| "Effect": "Allow", | ||
| "Action": "s3:*Object", | ||
| "Resource": ["arn:aws:s3:::[ID]/*"] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| Create `GoCourseLambdaUserReadWriteS3` Role allowing accounts + Lambda to read/write, trust policy: | ||
|
|
||
| ```json | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "AWS": "arn:aws:iam::[ID]:root" | ||
| }, | ||
| "Action": "sts:AssumeRole" | ||
| }, | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "Service": "lambda.amazonaws.com" | ||
| }, | ||
| "Action": "sts:AssumeRole" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| We can then load using ARN passed via env: | ||
|
|
||
| ```go | ||
| // Set up S3 session | ||
| // All clients require a Session. The Session provides the client with | ||
| // shared configuration such as region, endpoint, and credentials. | ||
| sess := session.Must(session.NewSession()) | ||
|
|
||
| // Create the credentials from AssumeRoleProvider to assume the role | ||
| // referenced by the ARN. | ||
| creds := stscreds.NewCredentials(sess, awsRoleArn) | ||
|
|
||
| // Create service client value configured for credentials | ||
| // from assumed role. | ||
| svc := s3.New(sess, &aws.Config{Credentials: creds}) | ||
| ``` | ||
|
|
||
| Need to create a `docker_env` file with config: | ||
|
|
||
| ```env | ||
| AWS_REGION=eu-west-1 | ||
| AWS_ROLE_ARN=arn:aws:iam::[ID]:role/GoCourseLambdaUserReadWriteS3 | ||
| S3_BUCKET=[ID] | ||
| ``` | ||
|
|
||
| ## Output | ||
|
|
||
| Write CSV with the input and output together. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| SHELL=/bin/bash | ||
| .PHONY: run test develop | ||
|
|
||
| outputs: | ||
| mkdir -p outputs | ||
|
|
||
| run: outputs | ||
| docker build --target run -t run . | ||
| docker run \ | ||
| --env-file docker_env \ | ||
| --mount type=bind,source="$$(echo $$HOME)/.aws",target=/root/.aws \ | ||
| --mount type=bind,source="$$(pwd)/inputs",target=/inputs \ | ||
| --mount type=bind,source="$$(pwd)/outputs",target=/outputs \ | ||
| --rm run | ||
|
|
||
| test: outputs | ||
| docker build --target test -t test . | ||
| docker run \ | ||
| --rm test | ||
|
|
||
| develop: outputs | ||
| docker build --target develop -t develop . | ||
| docker run -it \ | ||
| --env-file docker_env \ | ||
| --mount type=bind,source="$$(go env GOCACHE)",target=/root/.cache/go-build \ | ||
| --mount type=bind,source="$$(echo $$HOME)/.aws",target=/root/.aws \ | ||
| --mount type=bind,source="$$(pwd)",target=/app \ | ||
| --mount type=bind,source="$$(pwd)/inputs",target=/inputs \ | ||
| --mount type=bind,source="$$(pwd)/outputs",target=/outputs \ | ||
| --mount type=bind,source="/tmp",target=/tmp \ | ||
| --rm develop |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| module github.com/CodeYourFuture/immersive-go-course/batch-processing | ||
|
|
||
| go 1.19 | ||
|
|
||
| require ( | ||
| github.com/aws/aws-sdk-go v1.44.109 // indirect | ||
| github.com/jmespath/go-jmespath v0.4.0 // indirect | ||
| gopkg.in/gographics/imagick.v2 v2.6.2 // indirect | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| github.com/aws/aws-sdk-go v1.44.109 h1:+Na5JPeS0kiEHoBp5Umcuuf+IDqXqD0lXnM920E31YI= | ||
| github.com/aws/aws-sdk-go v1.44.109/go.mod h1:y4AeaBuwd2Lk+GepC1E9v0qOiTws0MIWAX4oIKwKHZo= | ||
| github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= | ||
| github.com/jmespath/go-jmespath v0.4.0 h1:BEgLn5cpjn8UN1mAw4NjwDrS35OdebyEtFe+9YPoQUg= | ||
| github.com/jmespath/go-jmespath v0.4.0/go.mod h1:T8mJZnbsbmF+m6zOOFylbeCJqk5+pHWvzYPziyZiYoo= | ||
| github.com/jmespath/go-jmespath/internal/testify v1.5.1/go.mod h1:L3OGu8Wl2/fWfCI6z80xFu9LTZmf1ZRjMHUOPmWr69U= | ||
| github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= | ||
| github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= | ||
| github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= | ||
| golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk= | ||
| golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= | ||
| golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= | ||
| golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= | ||
| golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ= | ||
| golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= | ||
| gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= | ||
| gopkg.in/gographics/imagick.v2 v2.6.2 h1:8ILTJzDKQKSYSfav+9GZs9H8zOOR2UtZVTWkUdFoiZ8= | ||
| gopkg.in/gographics/imagick.v2 v2.6.2/go.mod h1:/QVPLV/iKdNttRKthmDkeeGg+vdHurVEPc8zkU0XgBk= | ||
| gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| url | ||
| https://images.unsplash.com/photo-does-not-exist |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.