-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Problem
This issue is very particular and only shows up in the case described below:
This issue is related to this #199 .
It only shows up in this very particular case, but I assume it'll apply to several other string operations too.
Although, we used #191 to fix the dtypes serialization issue, this error only shows up when you have a Series with ListType amongst non-list type Series in the DataFrame.
For example
rdf = RemoteDataFrame({'0': ["Hello world"], '1': ["People "]})
# When split is applied to the `1` column, then the expected resulting DataFrame would be like this
rdf.split(" ", col='1')
|str|list[str]|
|...|["People"]|#191 the fact that polars cannot create a DataFrame from "an unbalanced" set of series.
Unbalanced as in:
If there wasn't the list[str] series amongst the result, then FetchableLazyFrame would have been able to construct the symbolic dataframe.
Patch from #191
def _from_reference(client: BastionLabPolars, ref: ReferenceResponse) -> LDF:
header = json.loads(ref.header)["inner"]
def get_dtype(v):
if isinstance(v, str):
return getattr(pl, v)()
else:
k, v = list(v.items())[0]
v = get_dtype(v)
return getattr(pl, k)(v)
def get_series(name, dtype):
if isinstance(dtype, str):
return pl.Series(name, dtype=get_dtype(dtype))
else:
return pl.Series(name, values=[[]], dtype=get_dtype(dtype))
dfSolution
The solution would be to make sure that after creating the list of series, we normalize the empty series types by making sure they have empty values within all.
For ListTypes, we use [].
We would have to look for the corresponding empty types in all the datatypes in order to not create adding values.