for a particular pdf
sha:3acd68c1cb7effbc9c2cf50fda6decd96d555d64
the first line of the first page fails to extracted the title correctly
sha:c64e0721c2d5ccdf48992d9a78dbe7d179bbf471
in particular, the venerable pdftotext appears to recognize the newline that
separates the title from the author name. here is the extracted pdftotext blob
sha:c64e0721c2d5ccdf48992d9a78dbe7d179bbf471
why?