VCNet examples

This example illustrates how to deal with a three classes dataset. When there are only two classes to predict, the generation of counterfactuals for one instance of the class 0 will be of class 1. Wherease, in the more than two-classes classification problem, the user have a choice on the target class for counterfactuals.

In VCNet, the choice of the class to predict is managed by a class change strategy. The choice of the strategy can be set up with parameters class_change and class_change_norm in the settings. We implemented two different strategies: reverse and second_max (see documentation of VCNetBase.__change_class()).

Let assume the class probabilities vector is [0.3, 0.6, 0.1]. In the reverse strategy, the resulting vector will by [0.7, 0.4, 0.9]: it favors the class with the lower predicted probability to be chosen to generate counterfactuals. In the second_max strategy, it yields the vector [0.3, 0.0, 0.1] … and then, it is the secondly predicted class that is used to generate counterfactuals.

In practice, these vectors are not used as it, but are normalized … and there are three different ways to normalize them: “sum-norm” will normalized using the sum of vector elements in the first strategy, we obtain the final vector [0.35, 0.2, 0.45]; “softmax” applies the softmax function to make a similar normalisation; while “absolute” will yields the vector [0.0, 0.0, 1.0] to force the counterfactual to be purely an example like the third class.

In this example, we illustrate how to use this parameters on the well-known Iris dataset.

Simple use of a class change strategy

Let start by loading the library that will be used in this example.

import torch
import lightning as L
import numpy as np
import pandas as pd

from vcnet import DataCatalog, VCNet

Let now load the dataset and define the default hyperparameters for it. In this case, we choose the second_max strategy, this means that the class of the counterfactual will be second most likely class for the instance (the first class is the predicted one).

Note

This example use a joint-model, the same applies with a post-hoc model.

df = pd.read_csv("datasets/iris.data")

hp = {
    "dataset": {
        "target": "species",
        "continuous": ["sepal_length", "sepal_width", "petal_length", "petal_width"],
        "categorical": [],
        "immutables": [],
        "scaling_method": "MinMax",
        "encoding_method": "OneHot_drop_binary",
        "activate_rounding": False,
    },
    "vcnet_params": {
        "lr": 1e-3,
        "epochs": 50,
        "lambda_KLD": 0.5,
        "lambda_CE": 1.0,
        "lambda_BCE": 1.0,
        "latent_size": 19,
        "latent_size_share": 64,
        "mid_reduce_size": 16,

        # Setting of a defaut class change strategy
        "class_change": "second_max",
        "class_change_norm": "absolute",
    },
}

And now … we continue with the effective training of the joint-model. One can notice that it is exactly the same procedure as for two classes problems.

Note

The class change strategy is not used during the learning phase due to the specificity of VCNet which is based on the class-disantanglement trick to generate counterfactuals. During the learning phase, VCNet simply models the space of examples conditionaly to the classes, but it does not need to know which the target class of the counterfactuals.

dataset = DataCatalog(hp["dataset"])
hp["dataset"] = dataset.prepare_data(df)

train_loader = dataset.train_dataloader()

vcnet = VCNet(hp)

# train the model
trainer = L.Trainer(max_epochs=hp["vcnet_params"]["epochs"])
trainer.fit(model=vcnet, train_dataloaders=train_loader)

We can now generate counterfactual (on the test set). In this code snippet below, we first gather the dataset containing all new examples in a dataframe.

vcnet.eval()
data, labels = next(dataset.test_dataloader()._get_iterator())

cl = vcnet.forward_pred(data)
cf, clcf = vcnet.counterfactuals(data)

cfdf = dataset.data_unloader(cf, clcf)

# Computation of accuracy and validity with more than 2 classes
acc = torch.sum(torch.argmax(cl, 1) == torch.argmax(labels, 1)) / len(data)
validity = torch.sum(torch.argmax(cl, 1) != torch.argmax(clcf, 1)) / len(data)
print(f"Accuracy: {acc}, validity:{validity}")

Compare different strategies

The code snippet below illustrates how to compare different strategy … as we have seen earlier, the strategy can be change at inference stage. It does not impact the construction of the counterfactual generator.

It generates the counterfactuals for only 3 instances

# we take the first 3-th examples
small_data = data[:3, :]

for strategy in ['reverse', 'second_max']:
    vcnet.class_change_strategy = strategy
    for normalization in ["softmax","absolute","sum-norm"]:
        vcnet.class_change_norm = normalization

        cf, clcf = vcnet.counterfactuals( small_data )
        print(f"Strategy {strategy}/{normalization}:")
        print(cl)
        print(clcf)

Example of generating examples with a given target class

The strategy can also be completely bypassed when a user wants to target a specific class for a counterfactual. In this case, s/he can simply provide the target class name when the counterfactual is generated.

In the example below, we generate counterfactuals for the first 10th instances of the test set with a target class setosa defined in the target_label variable.

# we take the first 10-th examples
small_data = data[:10, :]

# we get the code for a target class
target_label = torch.tensor(
    dataset.class_encoding(np.repeat("setosa", 10)),
    dtype=torch.float32
)

# finally, we apply the generation of counterfactuals for class 'setosa' as target
cf, clcf = vcnet.counterfactuals(small_data, target_label)

print(clcf)