-
Notifications
You must be signed in to change notification settings - Fork 87
Plugin-498: Added content-type to GCS sink #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin-498: Added content-type to GCS sink #462
Conversation
} | ||
|
||
if (contentType == null) { | ||
contentType = DEFAULT_CONTENT_TYPE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't modify field in the validate method. Instead, introduce "getter" method do the defaulting logic in there. In the validate method, call the getter method for validation if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the updated PR.
private static final String RECORDS_UPDATED_METRIC = "records.updated"; | ||
public static final String AVRO_NAMED_OUTPUT = "avro.mo.config.namedOutput"; | ||
public static final String COMMON_NAMED_OUTPUT = "mapreduce.output.basename"; | ||
public static final String CONTENT_TYPE = "Content-Type"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix the FS configuration key. Also, use "." notation. E.g. "io.cdap.gcs.batch.sink.content.type".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed to public static final String CONTENT_TYPE = "io.cdap.gcs.batch.sink.content.type";
in the updated PR.
@flakrimjusufi If multiple GCS sinks are being used in a single pipeline with different content type. What is the behavior? Any conflicts for content type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure changes have been tested for cases:
- Macros
- Backwards compatibility
- Also please add unit tests to these and modify integration tests to support this fix
/gcbrun |
|
We did some tests with multiple GCS sink with different content types and the behavior was as expected. There were no conflicts for content type. The pipelines were able to run successfully. |
|
public String getContentType() { | ||
if (!Strings.isNullOrEmpty(contentType)) { | ||
if (contentType.equals(CONTENT_TYPE_OTHER)) { | ||
if (Strings.isNullOrEmpty(customContentType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means if customContentType
is a macro, we use default content type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in here as well, if the value of customContentType is a macro, the value that was specified by the user is being used.
src/main/java/io/cdap/plugin/gcp/gcs/sink/GCSOutputFormatProvider.java
Outdated
Show resolved
Hide resolved
Integration tests: cdapio/cdap-integration-tests#1083 |
|
src/main/java/io/cdap/plugin/gcp/gcs/sink/RecordFilterOutputFormat.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
if (containsMacro(NAME_CONTENT_TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this check, by default if its a macro or not provided at configure time, it will be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Updated.
failureCollector.addFailure(String.format( | ||
"Valid content types for json are %s, %s", CONTENT_TYPE_APPLICATION_JSON, | ||
CONTENT_TYPE_TEXT_PLAIN), | ||
null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move remaining to above line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Updated.
src/main/java/io/cdap/plugin/gcp/gcs/sink/RecordFilterOutputFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/io/cdap/plugin/gcp/gcs/sink/RecordFilterOutputFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/io/cdap/plugin/gcp/gcs/sink/RecordFilterOutputFormat.java
Outdated
Show resolved
Hide resolved
switch (format) { | ||
case FORMAT_AVRO: | ||
if (!contentType.equalsIgnoreCase(CONTENT_TYPE_APPLICATION_AVRO)) { | ||
failureCollector.addFailure(String.format("Valid content type for avro is %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add period to the end of all the validation error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Updated.
src/main/java/io/cdap/plugin/gcp/gcs/sink/GCSOutputCommitter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments. Please squash the commits after fixing.
Also please file a jira for GCS Multi sink plugin
src/test/java/io/cdap/plugin/gcp/gcs/sink/GCSBatchSinkTest.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Nullable | ||
@Description("Gets the value of content type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a javadoc not annotation. The javadoc should include mapping of format -> content type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated.
39041a1
to
484b73f
Compare
The PR is updated according to the latest review. Commits are also squashed. Here you have the JIRA tikcket for GCS Multi Sink plugin: |
Add content-type to GCS sink
JIRA Ticket: https://cdap.atlassian.net/browse/PLUGIN-498