Skip to content

Help with function implementationΒ #5811

@dutraluiz

Description

@dutraluiz

What's your use case?
I'm would like to implement a function to replace low values with mean/10, and high values with percentil 99.5

What's your proposed solution?
I'm implementing the code proposed by Naghetini and Silveira (2021). It is available here.

My code based on Naghetini and Silveira (2021) is:

# Get libraries
import Orange
import numpy as np
from Orange.data import Domain, Table
import pandas as pd

# How Orange passes data to widget
df = in_data.copy()

#radiometric data
radio = df[['K', 'eU', 'eTh']]

#function
#https://github.com/fnaghetini/Mapa-Preditivo/blob/main/functions/Custom_Cleaning.py
def truncateVar(data = None, col = None):
    lower = data[col].mean()/10
    upper = data[col].quantile(0.995)
    var_trunc = []
    for v in data[col]:
        if v <= lower:
            v = lower
            var_trunc.append(v)
        elif v >= upper:
            v = upper
            var_trunc.append(v)
        else:
            var_trunc.append(v)
    return pd.Series(var_trunc)

#applying function
for r in radio:
    df[r] = truncateVar(data = radio, col = r)

# get table of data (X) and class variables (y)
X, y = df.values, df.y

# Get the target and feature variables
d = Domain(df.domain.attributes, df.domain.class_vars)

# Create a new Orange Table object with the appropriate headers
# This is how Orange passes the data on to the next widget
out_data = Orange.data.Table(d, X, y)

But I'm getting this error:

Running script:
C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py:861: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
self.X = source.X[row_indices]
Traceback (most recent call last):
File "", line 1, in
File "", line 11, in
File "C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py", line 1151, in getitem
return self.from_table_rows(self, key)
File "C:\Users\User\AppData\Local\Programs\Orange\lib\site-packages\Orange\data\table.py", line 861, in from_table_rows
self.X = source.X[row_indices]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Data:
data.csv

Suggestions?
Thanks

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions