VCNet post-hoc examples (with sklearn classifier)
In this page, we illustrate how to use PHVCNet
to generate counterfactuals for a swallow classifier.
We take as example a sklearn classifier.
Warning
The classifier has to be probabilitic.
We take the same dataset as the VCNet example: the Adult dataset.
This time, we will need to import two classes of our library:
from vcnet import PHVCNet
from vcnet import sklearnClassifier
PHVCNet
is the implementation of the counterfactual generator.
sklearnClassifier
is a wrapper for sklearn classifiers. It is set up be a configuration dictionnary that will specify the classifier to use.
Preparing the dataset
The data preparation is exactly the same as for VCNet. Please refer to the previous example.
In principle, you create a DataCatalog
from the settings and you prepare the data.
dataset = DataCatalog(dataset_settings)
dataset.prepare_data(df)
Prepare and train the classification model
Let us now look at how to define the classification model and how to train it through the sklearnClassifier
class.
You can use the following code as template:
hp = {
"dataset": dataset_settings,
"classifier_params" : {
"skname": "RandomForestClassifier",
"kwargs": {
"n_estimators" : 50,
}
}
}
train_loader = dataset.train_dataloader()
test_loader = dataset.test_dataloader()
classifier = sklearnClassifier(hp)
classifier.fit(dataset.df_train)
In this code, the hyperparameters of the classifier must be described in classifier_params, it takes the name of the class of the sklearn classifier to use, and its kwargs. In this example, we use a random forest with 50 classification trees.
Once the classifier has been defined through the hyper-parameters given to the sklearnClassifier
… then, the fit()
function train the classifier on the train dataset.
At this stage, you only have the classifier … but not yet the counterfactuals generator.
Training the counterfactual generator model
The final step, is to train the counterfactual generator model post-hoc.
This is very similar to the VCNet example … the main difference is that the definition of an instance of a PHVCNet
requires a trained classifier as parameter.
import Lightning as L
hp = {
"dataset": hp['dataset'],
"classifier_params" : hp['classifier_params'],
"vcnet_params" : {
"lr": 1e-2,
"epochs" : 10,
"lambda_KLD": 0.5,
"lambda_BCE": 1,
"latent_size" : 16,
"latent_size_share" : 64,
"mid_reduce_size" : 32
}
}
# now you define the post-hoc VCNet model
vcnet = PHVCNet(hp, classifier)
#finally you fit it with a Lightning module
trainer = L.Trainer(max_epochs=hp['vcnet_params']['epochs'])
trainer.fit(model=vcnet, train_dataloaders=train_loader)
Once your CF generation model has been trained, it can be used in the same way as the other VCNet models (see other example)