-
Notifications
You must be signed in to change notification settings - Fork 220
Open
Labels
type: bugSomething isn't workingSomething isn't working
Description
Describe the bug
I use anaconda and install the dataprep module by the following code
conda install -c conda-forge dataprep
Then I try the example code from the website
from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
create_report(df).show()
and it showed the following error:
from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
create_report(df).show()
Computing series-max-agg-6f34ce939adc72d34b6b5a81d3b66957: 0%| | 0/1420 [00:00<?, ?it/s]C:\ProgramData\anaconda3\Lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in divide
return func(*(_execute_task(a, cache) for a in args))
error happended in column:Survived
Traceback (most recent call last):
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653 in get_loc
values are attempted to be sorted, but any TypeError from
File pandas\_libs\index.pyx:147 in pandas._libs.index.IndexEngine.get_loc
File pandas\_libs\index.pyx:176 in pandas._libs.index.IndexEngine.get_loc
File pandas\_libs\hashtable_class_helper.pxi:7080 in pandas._libs.hashtable.PyObjectHashTable.get_item
File pandas\_libs\hashtable_class_helper.pxi:7088 in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Survived'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Cell In[123], line 1
create_report(df).show()
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\__init__.py:68 in create_report
"components": format_report(df, cfg, mode, progress),
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:78 in format_report
comps = format_basic(edaframe, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:291 in format_basic
res_variables = _format_variables(df, cfg, data)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:120 in _format_variables
rndrd = render(itmdt, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:2473 in render
visual_elem = render_cat(itmdt, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:1573 in render_cat
fig = bar_viz(
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:223 in bar_viz
df["pct"] = df[col] / nrows * 100
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\frame.py:3761 in __getitem__
key = com.apply_if_callable(key, self)
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655 in get_loc
KeyError: 'Survived'
My numpy version is 1.25.2
My pandas version is 2.0.3
My Python version is 3.11.4
I want to know why this error happen and how to solve it.
Is there anything needed to be added?
Thank you so much!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
type: bugSomething isn't workingSomething isn't working