Skip to content

Commit 6715b4a

Browse files
Updates for text_extraction_benchmarks (#546)
1 parent 9299d4c commit 6715b4a

File tree

2 files changed

+3
-1
lines changed

2 files changed

+3
-1
lines changed

Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ ENV RESOURCES_PATH "/dedoc_root/resources"
99
COPY requirements.txt .
1010
RUN pip3 install --no-cache-dir -r requirements.txt
1111
RUN apt-get update && apt-get install -y --fix-missing --no-install-recommends fontforge
12+
RUN apt install -y libutf8proc-dev
13+
RUN ln -s /usr/lib/x86_64-linux-gnu/libutf8proc.so /usr/lib/libutf8proc.so.1
1214

1315
RUN mkdir /dedoc_root
1416
RUN mkdir /dedoc_root/dedoc

dedoc/extensions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
eml_like_format={".eml"},
6464
mhtml_like_format={".mhtml", ".mht", ".mhtml.gz", ".mht.gz"},
6565
archive_like_format={".zip", ".tar", ".tar.gz", ".rar", ".7z"},
66-
image_like_format={".png"},
66+
image_like_format={".png", ".jpg", ".jpeg", ".tiff", ".tif"},
6767
pdf_like_format={".pdf"},
6868
csv_like_format={".csv", ".tsv"},
6969
txt_like_format={".txt", ".txt.gz"},

0 commit comments

Comments
 (0)