In the transformation step, use the metadata available about document type to filter document based on their type, for instance it's better to run grobid on some types then try with the others.
this was used for the HAL metadata : https://github.com/anHALytics/anhalytics-core/blob/master/anhalytics-harvest/src/main/java/fr/inria/anhalytics/harvest/grobid/GrobidProcess.java#L32