Skip to content

Result from symmetric shap does NOT match with the SHAP package #12

@gtmdotme

Description

@gtmdotme

I dumped the adult_dataset that you mention in ReadMe into a csv and run a RandomForestClassifier with almost same settings and calculate shap values from the SHAP package library in python (as given by the author of SHAP paper). I then compare these results with the symmetric counterpart of your library.

  1. As a result, I can't see the same (even approximately) set of global shapley values.
  2. Also I don't understand for calculating the global shapley value, you find the mean of shapley values for every instance while the SHAP paper suggests doing an mean of absolute of those shapley values.

Pseudo Code:

import numpy as np
import pandas as pd
import shap
from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv('adult_dataset.csv')
# encode categorical variables and get features and labels in X, y
X, y = preprocess(df)
model = RandomForestClassifier(max_depth=6, random_state=0, n_estimators=300)
model.fit(X, y)

shap.initjs()
explainer= shap.TreeExplainer(model, data=X)
shap_values = explainer.shap_values(X)

# Global shapley values
gsv = np.mean(np.abs(shap_values[1]), axis=0)

As a side note, TreeExplainer finds exact shap values but your results don't match with KernelExplainer even.

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions