-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
What's your use case?
I'm would like to implement a function to replace low values with mean/10, and high values with percentil 99.5
What's your proposed solution?
I'm implementing the code proposed by Naghetini and Silveira (2021). It is available here.
My code based on Naghetini and Silveira (2021) is:
# Get libraries
import Orange
import numpy as np
from Orange.data import Domain, Table
import pandas as pd
# How Orange passes data to widget
df = in_data.copy()
#radiometric data
radio = df[['K', 'eU', 'eTh']]
#function
#https://github.com/fnaghetini/Mapa-Preditivo/blob/main/functions/Custom_Cleaning.py
def truncateVar(data = None, col = None):
lower = data[col].mean()/10
upper = data[col].quantile(0.995)
var_trunc = []
for v in data[col]:
if v <= lower:
v = lower
var_trunc.append(v)
elif v >= upper:
v = upper
var_trunc.append(v)
else:
var_trunc.append(v)
return pd.Series(var_trunc)
#applying function
for r in radio:
df[r] = truncateVar(data = radio, col = r)
# get table of data (X) and class variables (y)
X, y = df.values, df.y
# Get the target and feature variables
d = Domain(df.domain.attributes, df.domain.class_vars)
# Create a new Orange Table object with the appropriate headers
# This is how Orange passes the data on to the next widget
out_data = Orange.data.Table(d, X, y)
But I'm getting this error:
Running script:
C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py:861: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]instead ofarr[seq]. In the future this will be interpreted as an array index,arr[np.array(seq)], which will result either in an error or a different result.
self.X = source.X[row_indices]
Traceback (most recent call last):
File "", line 1, in
File "", line 11, in
File "C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py", line 1151, in getitem
return self.from_table_rows(self, key)
File "C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py", line 861, in from_table_rows
self.X = source.X[row_indices]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Data:
data.csv
Suggestions?
Thanks