-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NIFI-13997 Load the requested python processor only #9568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-13997 Load the requested python processor only #9568
Conversation
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this improvement @fgerlits.
On initial review, I'm concerned about building a test case based on the Milvus and Unstructured libraries. These libraries have large transitive dependency trees, which can change often. This also takes more time to download and run. I recommend building the tests using much more narrowly-scoped dependencies, preferably avoiding transitive dependencies as much as possible.
Before this change: when multiple python processors were present in a package, we only installed the dependencies from requirements.txt plus those from ProcessorDetails.dependencies of the requested processor. Then we tried to load all the python files in the package, including other processors which may have other dependencies. The result was that the requested processor could not be loaded. After the change: we only load the requested processor and any non-processor helper files which are present in the package.
There are 26 tests (+1 added now) in PythonControllerInteractionIT.java, which is in nifi-py4j-integration-tests, but these were not run in any of the CI jobs. I have added them to system-tests, because integration-tests explicitly skips nifi-py4j-integration-tests with a comment that it is run as part of system-tests.
c20e05b to
19f2f34
Compare
fixed in 19f2f34 |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @fgerlits. I noted one additional test-related concern regarding external network requests.
One general question, however, is that I'm not sure about the general strategy of supporting different sets of dependencies in this way. The general purpose of a shared module is to have shared dependencies. That is the case of Java modules, and that should generally be the expectation for Python modules. The fact that there are ways to support both inline and external dependencies with Python Processors is a special feature, but I'm not sure whether this is something we should promote in general. That being said, the runtime code changes that simply filter out loading other Processors in the same module same module seem reasonable, so I'm willing to go forward with these changes.
...fi-python-test-extensions/src/main/resources/extensions/multi-processor/CreateHttpRequest.py
Show resolved
Hide resolved
Yes, shared dependencies per package is the way to go in most cases. We did, however, run into a case of a package of "parser" processors, and while most of their dependencies were common, some of the format-specific libs were quite large. We would like to avoid loading all format-specific libs if only one format is used. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for working through the feedback @fgerlits! +1 merging
Summary
NIFI-13997
Load the requested python processor only
Before this change: when multiple python processors were present in a package,
we only installed the dependencies from requirements.txt plus those from
ProcessorDetails.dependencies of the requested processor. Then we tried to
load all the python files in the package, including other processors which may
have other dependencies. The result was that the requested processor could not
be loaded.
After the change: we only load the requested processor and any non-processor
helper files which are present in the package.
Enable py4j integration tests in system-tests
There are 26 tests (+1 added now) in PythonControllerInteractionIT.java, which
is in nifi-py4j-integration-tests, but these were not run in any of the CI
jobs. I have added them to system-tests, because integration-tests explicitly
skips nifi-py4j-integration-tests with a comment that it is run as part of
system-tests.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation