VCNet post-hoc examples (with sklearn classifier)

In this page, we illustrate how to use PHVCNet to generate counterfactuals for a swallow classifier. We take as example a sklearn classifier.

Warning

The classifier has to be probabilitic.

We take the same dataset as the VCNet example: the Adult dataset.

This time, we will need to import two classes of our library:

from vcnet import PHVCNet
from vcnet import sklearnClassifier

PHVCNet is the implementation of the counterfactual generator. sklearnClassifier is a wrapper for sklearn classifiers. It is set up be a configuration dictionnary that will specify the classifier to use.

Preparing the dataset

The data preparation is exactly the same as for VCNet. Please refer to the previous example.

In principle, you create a DataCatalog from the settings and you prepare the data.

dataset = DataCatalog(dataset_settings)
dataset.prepare_data(df)

Prepare and train the classification model

Let us now look at how to define the classification model and how to train it through the sklearnClassifier class.

You can use the following code as template:

hp = {
    "dataset": dataset_settings,
    "classifier_params" : {
        "skname":  "RandomForestClassifier",
        "kwargs": {
            "n_estimators" : 50,
        }
    }
}

train_loader = dataset.train_dataloader()
test_loader = dataset.test_dataloader()

classifier = sklearnClassifier(hp)
classifier.fit(dataset.df_train)

In this code, the hyperparameters of the classifier must be described in classifier_params, it takes the name of the class of the sklearn classifier to use, and its kwargs. In this example, we use a random forest with 50 classification trees.

Once the classifier has been defined through the hyper-parameters given to the sklearnClassifier … then, the fit() function train the classifier on the train dataset.

At this stage, you only have the classifier … but not yet the counterfactuals generator.

Training the counterfactual generator model

The final step, is to train the counterfactual generator model post-hoc. This is very similar to the VCNet example … the main difference is that the definition of an instance of a PHVCNet requires a trained classifier as parameter.

import Lightning as L

hp = {
    "dataset": hp['dataset'],
    "classifier_params" : hp['classifier_params'],
    "vcnet_params" : {
        "lr":  1e-2,
        "epochs" : 10,
        "lambda_KLD": 0.5,
        "lambda_BCE": 1,
        "latent_size" : 16,
        "latent_size_share" :  64,
        "mid_reduce_size" : 32
    }
}

# now you define the post-hoc VCNet model
vcnet = PHVCNet(hp, classifier)

#finally you fit it with a Lightning module
trainer = L.Trainer(max_epochs=hp['vcnet_params']['epochs'])
trainer.fit(model=vcnet, train_dataloaders=train_loader)

Once your CF generation model has been trained, it can be used in the same way as the other VCNet models (see other example)