Reading CSV > Avoid redefining column types #1751
Replies: 2 comments 4 replies
-
|
Hi! Thanks :) |
Beta Was this translation helpful? Give feedback.
-
|
Another approach could be to "disable" automatic CSV type parsing for all columns, like by setting each column to DataFrame.readCsv(
inputStream = ByteArrayInputStream(data),
colTypes = mapOf(ColType.DEFAULT to ColType.String),
)and then, One downside of this approach is indeed that if you have a column of, say type With the solution of @koperagen, this is not the case, as supplying We might actually have a need for a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team
Thanks for the library! Big fan 😃
We're facing a pain point in reading CSV's, with its need to explicitly define column data types more than once. Otherwise risk them being parsed incorrectly.
Take for instance the following:
Notice how we twice define
IDas type String.We came across the need for this as when CSVs contains ID values appearing to be integers, ie
0123, when parsed, are parsed as integers thereby omitted leading0's.Is there a way to avoid redefining the colTypes? Would like columns and their types be defined once in the DataSchema file. This would avoid duplication of code, and the possibility of devs forgetting to add secondary definitions.
Thanks again!
Beta Was this translation helpful? Give feedback.
All reactions