Data Leakage in pre-processing

In reference to this snippet from the preprocessing section in the file [model_notebook.ipynb](https://github.com/apache/fineract-credit-scorecard/blob/develop/ml/model_notebook.ipynb)
```
for col in data.columns:
        if(col not in categorical):
            data[col] = (data[col].astype('float') - np.mean(data[col].astype('float')))/np.std(data[col].astype('float'))
```
The normalization is done on the whole data (i.e. when the train and test split did not occur). This means that information from the test set is used to scale the training set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Leakage in pre-processing #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Leakage in pre-processing #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions