-
Notifications
You must be signed in to change notification settings - Fork 987
DRILL-7978: Fixed Width Format Plugin #2282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
MFoss19
wants to merge
46
commits into
apache:master
Choose a base branch
from
MFoss19:format-fixedwidth
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
8fd4018
Start of fixed width format plugin
5d1ea8b
Work in Progress. Producing Rows. Currently complains about buffer no…
7ef5cd3
First working version
6f4a2e7
Added more data types, refactored code
f59d4e4
Checkstyle fixes
9f2648c
Removed println statement from Batch Reader, Simplified logic
8c3f6eb
Modified format, fixed maxRecords in next(), modified Exception handl…
7c3b5a2
Addressing Review Comments.
57d49db
Added Serialization/Deserialization test, added blank row test file, …
1a4818e
Fixed Serialization/Deserialization test
1f1051e
Added another constructor to enable user to not have to enter dateTim…
cb1b932
Added method to validate field name input and verify there are no dup…
8419862
Added two getters to FixedwidthFormatConfig to prep for offset verifi…
31e1549
Added a check for overlapping fields
estherbuchwalter 07edbde
Updated check for overlapping fields
estherbuchwalter fa47a14
Added field validation for data types, indices, width. Includes creat…
523366a
Modified validation for field width and field index. Added comments t…
1a91592
Added to field validation for field names. Checks for valid length an…
estherbuchwalter 4b221b5
WIP converting to EVF v2. Pushing to repo for troubleshooting purposes.
4875367
Start of fixed width format plugin
ef0bc82
Work in Progress. Producing Rows. Currently complains about buffer no…
be14e25
First working version
c9014d2
Added more data types, refactored code
6d7a2a5
Checkstyle fixes
056df13
Removed println statement from Batch Reader, Simplified logic
d2097b3
Modified format, fixed maxRecords in next(), modified Exception handl…
f2920a0
Addressing Review Comments.
7a68da5
Added Serialization/Deserialization test, added blank row test file, …
e0110da
Fixed Serialization/Deserialization test
f01f1aa
Added another constructor to enable user to not have to enter dateTim…
5da7a77
Added method to validate field name input and verify there are no dup…
22ccbcb
Added two getters to FixedwidthFormatConfig to prep for offset verifi…
ecf6fb8
Added a check for overlapping fields
estherbuchwalter 1978e14
Updated check for overlapping fields
estherbuchwalter a79f8a5
Added field validation for data types, indices, width. Includes creat…
32e5312
Modified validation for field width and field index. Added comments t…
4d534df
Added to field validation for field names. Checks for valid length an…
estherbuchwalter aa74ec5
WIP converting to EVF v2. Pushing to repo for troubleshooting purposes.
1972fb9
Updating pom.xml with new drill snapshot version
f498123
Merge branch 'apache:master' into format-fixedwidth
MFoss19 1e75757
Renamed classes
tswagger a134619
Merge branch 'master' into format-fixedwidth
tswagger dfe894e
Merge remote-tracking branch 'megan/format-fixedwidth' into format-fi…
tswagger 28df7b2
Merge branch 'format-fixedwidth' of github.com:MFoss19/drill into for…
b68c54d
Merge branch 'format-fixedwidth' of github.com:MFoss19/drill into for…
bf6a16c
Updated pom.xml
tswagger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| <?xml version="1.0"?> | ||
| <!-- | ||
|
|
||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
|
|
||
| --> | ||
| <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
| <parent> | ||
| <artifactId>drill-contrib-parent</artifactId> | ||
| <groupId>org.apache.drill.contrib</groupId> | ||
| <version>2.0.0-SNAPSHOT</version> | ||
| </parent> | ||
| <artifactId>drill-format-fixedwidth</artifactId> | ||
| <name>Drill : Contrib : Format : FixedWidth</name> | ||
|
|
||
| <dependencies> | ||
| <dependency> | ||
| <groupId>org.apache.drill.exec</groupId> | ||
| <artifactId>drill-java-exec</artifactId> | ||
| <version>${project.version}</version> | ||
| </dependency> | ||
| <!-- <dependency>--> | ||
| <!-- <groupId>com.epam</groupId>--> | ||
| <!-- <artifactId>parso</artifactId>--> | ||
| <!-- <version>2.0.14</version>--> | ||
| <!-- </dependency>--> | ||
|
|
||
| <!-- Test dependencies --> | ||
| <dependency> | ||
| <groupId>org.apache.drill.exec</groupId> | ||
| <artifactId>drill-java-exec</artifactId> | ||
| <classifier>tests</classifier> | ||
| <version>${project.version}</version> | ||
| <scope>test</scope> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.drill</groupId> | ||
| <artifactId>drill-common</artifactId> | ||
| <classifier>tests</classifier> | ||
| <version>${project.version}</version> | ||
| <scope>test</scope> | ||
| </dependency> | ||
| </dependencies> | ||
| <build> | ||
| <plugins> | ||
| <plugin> | ||
| <artifactId>maven-resources-plugin</artifactId> | ||
| <executions> | ||
| <execution> | ||
| <id>copy-java-sources</id> | ||
| <phase>process-sources</phase> | ||
| <goals> | ||
| <goal>copy-resources</goal> | ||
| </goals> | ||
| <configuration> | ||
| <outputDirectory>${basedir}/target/classes/org/apache/drill/exec/store/fixedwidth</outputDirectory> | ||
| <resources> | ||
| <resource> | ||
| <directory>src/main/java/org/apache/drill/exec/store/fixedwidth</directory> | ||
| <filtering>true</filtering> | ||
| </resource> | ||
| </resources> | ||
| </configuration> | ||
| </execution> | ||
| </executions> | ||
| </plugin> | ||
| </plugins> | ||
| </build> | ||
|
|
||
| </project> | ||
99 changes: 99 additions & 0 deletions
99
...ixedwidth/src/main/java/org/apache/drill/exec/store/fixedwidth/FixedWidthBatchReader.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| package org.apache.drill.exec.store.fixedwidth; | ||
|
|
||
| import org.apache.drill.common.AutoCloseables; | ||
| import org.apache.drill.common.exceptions.CustomErrorContext; | ||
| import org.apache.drill.common.exceptions.UserException; | ||
| import org.apache.drill.common.types.TypeProtos; | ||
| import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader; | ||
| import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator; | ||
| import org.apache.drill.exec.physical.resultSet.ResultSetLoader; | ||
| import org.apache.drill.exec.record.metadata.SchemaBuilder; | ||
| import org.apache.drill.exec.record.metadata.TupleMetadata; | ||
| import org.apache.drill.shaded.guava.com.google.common.base.Charsets; | ||
| import org.apache.hadoop.mapred.FileSplit; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| import java.io.BufferedReader; | ||
| //import java.io.IOException; | ||
| import java.io.InputStream; | ||
| import java.io.InputStreamReader; | ||
|
|
||
| public class FixedWidthBatchReader implements ManagedReader { | ||
|
|
||
| private final int maxRecords; // Do we need this? | ||
| private final FixedWidthFormatConfig config; | ||
| private InputStream fsStream; | ||
| private ResultSetLoader loader; | ||
| private FileSplit split; | ||
| private CustomErrorContext errorContext; | ||
| private static final Logger logger = LoggerFactory.getLogger(FixedWidthBatchReader.class); | ||
| private BufferedReader reader; | ||
|
|
||
| public FixedWidthBatchReader(FileSchemaNegotiator negotiator, FixedWidthFormatConfig config, int maxRecords) { | ||
| this.loader = open(negotiator); | ||
| this.config = config; | ||
| this.maxRecords = maxRecords; | ||
| } | ||
|
|
||
| @Override | ||
| public boolean next() { | ||
| return true; | ||
| } | ||
|
|
||
| @Override | ||
| public void close() { | ||
| if (fsStream != null){ | ||
| AutoCloseables.closeSilently(fsStream); | ||
| fsStream = null; | ||
| } | ||
| } | ||
|
|
||
| private ResultSetLoader open(FileSchemaNegotiator negotiator) { | ||
| // this.split = (FileSplit) negotiator.split(); | ||
| this.errorContext = negotiator.parentErrorContext(); | ||
| // openFile(negotiator); | ||
|
|
||
| try { | ||
| negotiator.tableSchema(buildSchema(), true); | ||
| this.loader = negotiator.build(); | ||
| } catch (Exception e) { | ||
| throw UserException | ||
| .dataReadError(e) | ||
| .message("Failed to open input file: {}", this.split.getPath().toString()) | ||
| .addContext(this.errorContext) | ||
| .addContext(e.getMessage()) | ||
| .build(FixedWidthBatchReader.logger); | ||
| } | ||
| this.reader = new BufferedReader(new InputStreamReader(this.fsStream, Charsets.UTF_8)); | ||
| return this.loader; | ||
| } | ||
|
|
||
| // private void openFile(FileSchemaNegotiator negotiator) { | ||
| // try { | ||
| // this.fsStream = negotiator.file().fileSystem().openPossiblyCompressedStream(this.split.getPath()); | ||
| // sasFileReader = new SasFileReaderImpl(this.fsStream); | ||
| // firstRow = sasFileReader.readNext(); | ||
| // } catch (IOException e) { | ||
| // throw UserException | ||
| // .dataReadError(e) | ||
| // .message("Unable to open Fixed Width File %s", this.split.getPath()) | ||
| // .addContext(e.getMessage()) | ||
| // .addContext(this.errorContext) | ||
| // .build(FixedWidthBatchReader.logger); | ||
| // } | ||
| // } | ||
|
|
||
| private TupleMetadata buildSchema() { | ||
| SchemaBuilder builder = new SchemaBuilder(); | ||
| for (FixedWidthFieldConfig field : config.getFields()) { | ||
| if (field.getType() == TypeProtos.MinorType.VARDECIMAL){ | ||
| builder.addNullable(field.getName(), TypeProtos.MinorType.VARDECIMAL,38,4); | ||
| //revisit this | ||
| } else { | ||
| builder.addNullable(field.getName(), field.getType()); | ||
| } | ||
| } | ||
| return builder.buildSchema(); | ||
| } | ||
| } |
111 changes: 111 additions & 0 deletions
111
...ixedwidth/src/main/java/org/apache/drill/exec/store/fixedwidth/FixedWidthFieldConfig.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.drill.exec.store.fixedwidth; | ||
|
|
||
| import com.fasterxml.jackson.annotation.JsonCreator; | ||
| import com.fasterxml.jackson.annotation.JsonInclude; | ||
| import com.fasterxml.jackson.annotation.JsonProperty; | ||
| import com.fasterxml.jackson.annotation.JsonTypeName; | ||
| import org.apache.drill.common.PlanStringBuilder; | ||
| import org.apache.drill.common.types.TypeProtos; | ||
|
|
||
| import java.util.Objects; | ||
|
|
||
|
|
||
| @JsonTypeName("fixedwidthReaderFieldDescription") | ||
| @JsonInclude(JsonInclude.Include.NON_DEFAULT) | ||
| public class FixedWidthFieldConfig implements Comparable<FixedWidthFieldConfig> { | ||
|
|
||
| private final String name; | ||
| private final int index; | ||
| private int width; | ||
| private TypeProtos.MinorType type; | ||
| private final String dateTimeFormat; | ||
|
|
||
| public FixedWidthFieldConfig(@JsonProperty("name") String name, | ||
| @JsonProperty("index") int index, | ||
| @JsonProperty("width") int width, | ||
| @JsonProperty("type") TypeProtos.MinorType type) { | ||
| this(name, index, width, type, null); | ||
| } | ||
|
|
||
| @JsonCreator | ||
| public FixedWidthFieldConfig(@JsonProperty("name") String name, | ||
| @JsonProperty("index") int index, | ||
| @JsonProperty("width") int width, | ||
| @JsonProperty("type") TypeProtos.MinorType type, | ||
| @JsonProperty("dateTimeFormat") String dateTimeFormat) { | ||
| this.name = name; | ||
| this.index = index; | ||
| this.width = width; | ||
| this.type = type; | ||
| this.dateTimeFormat = dateTimeFormat; | ||
| } | ||
|
|
||
| public String getName() {return name;} | ||
|
|
||
| public int getIndex() {return index;} | ||
|
|
||
| public int getWidth() {return width;} | ||
|
|
||
| public TypeProtos.MinorType getType() {return type;} | ||
|
|
||
| public void setType() { | ||
| this.type = TypeProtos.MinorType.VARCHAR; | ||
| } | ||
|
|
||
| public String getDateTimeFormat() {return dateTimeFormat;} | ||
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(name, index, width, type, dateTimeFormat); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object obj) { | ||
| if (this == obj) { | ||
| return true; | ||
| } | ||
| if (obj == null || getClass() != obj.getClass()) { | ||
| return false; | ||
| } | ||
| FixedWidthFieldConfig other = (FixedWidthFieldConfig) obj; | ||
| return Objects.equals(name, other.name) | ||
| && Objects.equals(index, other.index) | ||
| && Objects.equals(width, other.width) | ||
| && Objects.equals(type, other.type) | ||
| && Objects.equals(dateTimeFormat, other.dateTimeFormat); | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return new PlanStringBuilder(this) | ||
| .field("name", name) | ||
| .field("index", index) | ||
| .field("width", width) | ||
| .field("type", type) | ||
| .field("dateTimeFormat", dateTimeFormat) | ||
| .toString(); | ||
| } | ||
|
|
||
| @Override | ||
| public int compareTo(FixedWidthFieldConfig o) { | ||
| return Integer.compare(this.getIndex(), o.getIndex()); | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.