Skip to content

Commit 0820278

Browse files
committed
feature: make tesseract version 5.2.0 default for amazonlinux-2 builds
1 parent 226fb20 commit 0820278

File tree

7 files changed

+17
-16
lines changed

7 files changed

+17
-16
lines changed

Dockerfile.al2

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
FROM lambci/lambda-base-2:build
33

44
ARG LEPTONICA_VERSION=1.82.0
5-
ARG TESSERACT_VERSION=4.1.3
5+
ARG TESSERACT_VERSION=5.2.0
66
ARG AUTOCONF_ARCHIVE_VERSION=2017.09.28
77
ARG TMP_BUILD=/tmp
88
ARG TESSERACT=/opt/tesseract
@@ -40,7 +40,7 @@ RUN curl -L https://github.com/tesseract-ocr/tesseract/archive/${TESSERACT_VERSI
4040
WORKDIR /opt
4141
RUN mkdir -p ${DIST}/lib && mkdir -p ${DIST}/bin && \
4242
cp ${TESSERACT}/bin/tesseract ${DIST}/bin/ && \
43-
cp ${TESSERACT}/lib/libtesseract.so.4 ${DIST}/lib/ && \
43+
cp ${TESSERACT}/lib/libtesseract.so.5 ${DIST}/lib/ && \
4444
cp ${LEPTONICA}/lib/liblept.so.5 ${DIST}/lib/liblept.so.5 && \
4545
cp /usr/lib64/libgomp.so.1 ${DIST}/lib/ && \
4646
cp /usr/lib64/libwebp.so.4 ${DIST}/lib/ && \

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Tesseract OCR Lambda Layer
22
===
33

4-
![Tesseract](https://img.shields.io/badge/Tesseract-4.1.3-green?style=flat-square)
4+
![Tesseract](https://img.shields.io/badge/Tesseract-5.2.0-green?style=flat-square)
55
![Leptonica](https://img.shields.io/badge/Leptonica-1.82.0-green?style=flat-square)
66

77
![Examples available for Runtimes](https://img.shields.io/badge/Examples_(Lambda_runtimes)-Python_3.6(AL1),Python_3.8(AL2)-informational?style=flat-square)
@@ -14,19 +14,20 @@ Tesseract OCR Lambda Layer
1414
1515
<!-- TOC -->
1616

17+
- [Tesseract OCR Lambda Layer](#tesseract-ocr-lambda-layer)
1718
- [Quickstart](#quickstart)
1819
- [Ready-to-use binaries](#ready-to-use-binaries)
19-
- [Use with Serverless Framework](#use-with-serverless-framework)
20-
- [Use with AWS CDK](#use-with-aws-cdk)
20+
- [Use with Serverless Framework](#use-with-serverless-framework)
21+
- [Use with AWS CDK](#use-with-aws-cdk)
2122
- [Build tesseract layer from source using Docker](#build-tesseract-layer-from-source-using-docker)
22-
- [available `Dockerfile`s](#available-dockerfiles)
23-
- [Building a different tesseract version and/or language](#building-a-different-tesseract-version-andor-language)
24-
- [Deployment size optimization](#deployment-size-optimization)
23+
- [available `Dockerfile`s](#available-dockerfiles)
24+
- [Building a different tesseract version and/or language](#building-a-different-tesseract-version-andor-language)
25+
- [Deployment size optimization](#deployment-size-optimization)
2526
- [Building the layer binaries directly using CDK](#building-the-layer-binaries-directly-using-cdk)
26-
- [Layer contents](#layer-contents)
27+
- [Layer contents](#layer-contents)
2728
- [Known Issues](#known-issues)
28-
- [Avoiding Pillow library issues](#avoiding-pillow-library-issues)
29-
- [Unable to import module 'handler': cannot import name '_imaging'](#unable-to-import-module-handler-cannot-import-name-_imaging)
29+
- [Avoiding Pillow library issues](#avoiding-pillow-library-issues)
30+
- [Unable to import module 'handler': cannot import name '_imaging'](#unable-to-import-module-handler-cannot-import-name-_imaging)
3031
- [Contributors :heart:](#contributors-heart)
3132

3233
<!-- /TOC -->
@@ -149,7 +150,7 @@ unset CONTAINER
149150

150151
## Building a different tesseract version and/or language
151152

152-
Per default the build generated the [tesseract 4.1.3](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.3) OCR libraries with the _fast_ german, english and osd (orientation and script detection) [data files](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files) included.
153+
Per default the build generates the [tesseract 4.1.3](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.3) (amazonlinux-1) or [5.2.0](https://github.com/tesseract-ocr/tesseract/releases/tag/5.2.0) (amazonlinux-2) OCR libraries with the _fast_ german, english and osd (orientation and script detection) [data files](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files) included.
153154

154155
The build process can be modified using different build time arguments (defined as `ARG` in `Dockerfile.al[1|2]`), using the `--build-arg` option of `docker build`.
155156

continous-integration/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ Commands to reproduce:
1414

1515
```bash
1616
npm ci
17-
npx cdk synth
17+
npx cdk --app 'npx ts-node index-al[1|2].ts' synth
1818
## run integration test using AL1 & Python 3.6
19-
npm npm run test:integration:al1
19+
npx npm run test:integration:al1
2020
## run integration test using AL2 & Python 3.8
21-
npm npm run test:integration:al2
21+
npx npm run test:integration:al2
2222
```
2323

2424
## Bundling
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
LEPTONICA_VERSION=1.82.0
2-
TESSERACT_VERSION=4.1.3
2+
TESSERACT_VERSION=5.2.0
33
TESSERACT_DATA_FILES=tessdata_fast/4.1.0
44
TESSERACT_DATA_LANGUAGES=osd,eng,deu
6.13 KB
Binary file not shown.
-3.4 MB
Binary file not shown.
3.13 MB
Binary file not shown.

0 commit comments

Comments
 (0)