schema/pdfbox2 fails to extract text as well as pdftotext

for a particular pdf

   sha:3acd68c1cb7effbc9c2cf50fda6decd96d555d64

the first line of the first page fails to extracted the title correctly

   sha:c64e0721c2d5ccdf48992d9a78dbe7d179bbf471

in particular, the venerable pdftotext appears to recognize the newline that
separates the title from the author name.  here is the extracted pdftotext blob

```
sha:c64e0721c2d5ccdf48992d9a78dbe7d179bbf471
```

why?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

schema/pdfbox2 fails to extract text as well as pdftotext #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

schema/pdfbox2 fails to extract text as well as pdftotext #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions