Skip to content

FetchableLazy Symbolic DataFrame with odd behavior. #204

@kbamponsem

Description

@kbamponsem

Problem

This issue is very particular and only shows up in the case described below:

This issue is related to this #199 .

It only shows up in this very particular case, but I assume it'll apply to several other string operations too.

Although, we used #191 to fix the dtypes serialization issue, this error only shows up when you have a Series with ListType amongst non-list type Series in the DataFrame.

For example

rdf = RemoteDataFrame({'0': ["Hello world"], '1': ["People "]})

# When split is applied to the `1` column, then the expected resulting DataFrame would be like this
rdf.split(" ", col='1')

|str|list[str]|
|...|["People"]|

#191 the fact that polars cannot create a DataFrame from "an unbalanced" set of series.

Unbalanced as in:
If there wasn't the list[str] series amongst the result, then FetchableLazyFrame would have been able to construct the symbolic dataframe.

Patch from #191

  def _from_reference(client: BastionLabPolars, ref: ReferenceResponse) -> LDF:
        header = json.loads(ref.header)["inner"]

        def get_dtype(v):
            if isinstance(v, str):
                return getattr(pl, v)()
            else:
                k, v = list(v.items())[0]
                v = get_dtype(v)
                return getattr(pl, k)(v)

        def get_series(name, dtype):
            if isinstance(dtype, str):
                return pl.Series(name, dtype=get_dtype(dtype))
            else:
                return pl.Series(name, values=[[]], dtype=get_dtype(dtype))

        df

Solution

The solution would be to make sure that after creating the list of series, we normalize the empty series types by making sure they have empty values within all.

For ListTypes, we use [].

We would have to look for the corresponding empty types in all the datatypes in order to not create adding values.

@cchudant @dhalf

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions