-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
I dumped the adult_dataset that you mention in ReadMe into a csv and run a RandomForestClassifier with almost same settings and calculate shap values from the SHAP package library in python (as given by the author of SHAP paper). I then compare these results with the symmetric counterpart of your library.
- As a result, I can't see the same (even approximately) set of global shapley values.
- Also I don't understand for calculating the global shapley value, you find the mean of shapley values for every instance while the SHAP paper suggests doing an mean of absolute of those shapley values.
Pseudo Code:
import numpy as np
import pandas as pd
import shap
from sklearn.ensemble import RandomForestClassifier
df = pd.read_csv('adult_dataset.csv')
# encode categorical variables and get features and labels in X, y
X, y = preprocess(df)
model = RandomForestClassifier(max_depth=6, random_state=0, n_estimators=300)
model.fit(X, y)
shap.initjs()
explainer= shap.TreeExplainer(model, data=X)
shap_values = explainer.shap_values(X)
# Global shapley values
gsv = np.mean(np.abs(shap_values[1]), axis=0)
As a side note, TreeExplainer finds exact shap values but your results don't match with KernelExplainer even.
Thanks in advance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels