VCNet examples ======================= This example illustrates how to deal with a three classes dataset. When there are only two classes to predict, the generation of counterfactuals for one instance of the class `0` will be of class `1`. Wherease, in the more than two-classes classification problem, the user have a choice on the target class for counterfactuals. In VCNet, the choice of the class to predict is managed by a *class change strategy*. The choice of the strategy can be set up with parameters `class_change` and `class_change_norm` in the settings. We implemented two different strategies: `reverse` and `second_max` (see documentation of :py:func:`VCNetBase.__change_class`). Let assume the class probabilities vector is `[0.3, 0.6, 0.1]`. In the `reverse` strategy, the resulting vector will by `[0.7, 0.4, 0.9]`: it favors the class with the lower predicted probability to be chosen to generate counterfactuals. In the `second_max` strategy, it yields the vector `[0.3, 0.0, 0.1]` ... and then, it is the secondly predicted class that is used to generate counterfactuals. In practice, these vectors are not used as it, but are normalized ... and there are three different ways to normalize them: "sum-norm" will normalized using the sum of vector elements in the first strategy, we obtain the final vector `[0.35, 0.2, 0.45]`; "softmax" applies the `softmax` function to make a similar normalisation; while "absolute" will yields the vector `[0.0, 0.0, 1.0]` to force the counterfactual to be purely an example like the third class. In this example, we illustrate how to use this parameters on the well-known *Iris* dataset. Simple use of a class change strategy ------------------------------------------ Let start by loading the library that will be used in this example. .. code-block:: python import torch import lightning as L import numpy as np import pandas as pd from vcnet import DataCatalog, VCNet Let now load the dataset and define the default hyperparameters for it. In this case, we choose the `second_max` strategy, this means that the class of the counterfactual will be second most likely class for the instance (the first class is the predicted one). .. note:: This example use a joint-model, the same applies with a post-hoc model. .. code-block:: python df = pd.read_csv("datasets/iris.data") hp = { "dataset": { "target": "species", "continuous": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "categorical": [], "immutables": [], "scaling_method": "MinMax", "encoding_method": "OneHot_drop_binary", "activate_rounding": False, }, "vcnet_params": { "lr": 1e-3, "epochs": 50, "lambda_KLD": 0.5, "lambda_CE": 1.0, "lambda_BCE": 1.0, "latent_size": 19, "latent_size_share": 64, "mid_reduce_size": 16, # Setting of a defaut class change strategy "class_change": "second_max", "class_change_norm": "absolute", }, } And now ... we continue with the effective training of the joint-model. One can notice that it is exactly the same procedure as for two classes problems. .. note:: The class change strategy is not used during the learning phase due to the specificity of VCNet which is based on the class-disantanglement trick to generate counterfactuals. During the learning phase, VCNet simply models the space of examples conditionaly to the classes, but it does not need to know which the target class of the counterfactuals. .. code-block:: python dataset = DataCatalog(hp["dataset"]) hp["dataset"] = dataset.prepare_data(df) train_loader = dataset.train_dataloader() vcnet = VCNet(hp) # train the model trainer = L.Trainer(max_epochs=hp["vcnet_params"]["epochs"]) trainer.fit(model=vcnet, train_dataloaders=train_loader) We can now generate counterfactual (on the test set). In this code snippet below, we first gather the dataset containing all new examples in a dataframe. .. code-block:: python vcnet.eval() data, labels = next(dataset.test_dataloader()._get_iterator()) cl = vcnet.forward_pred(data) cf, clcf = vcnet.counterfactuals(data) cfdf = dataset.data_unloader(cf, clcf) # Computation of accuracy and validity with more than 2 classes acc = torch.sum(torch.argmax(cl, 1) == torch.argmax(labels, 1)) / len(data) validity = torch.sum(torch.argmax(cl, 1) != torch.argmax(clcf, 1)) / len(data) print(f"Accuracy: {acc}, validity:{validity}") Compare different strategies ------------------------------------------------------------ The code snippet below illustrates how to compare different strategy ... as we have seen earlier, the strategy can be change at inference stage. It does not impact the construction of the counterfactual generator. It generates the counterfactuals for only 3 instances .. code-block:: python # we take the first 3-th examples small_data = data[:3, :] for strategy in ['reverse', 'second_max']: vcnet.class_change_strategy = strategy for normalization in ["softmax","absolute","sum-norm"]: vcnet.class_change_norm = normalization cf, clcf = vcnet.counterfactuals( small_data ) print(f"Strategy {strategy}/{normalization}:") print(cl) print(clcf) Example of generating examples with a given target class ------------------------------------------------------------ The strategy can also be completely bypassed when a user wants to target a specific class for a counterfactual. In this case, s/he can simply provide the target class name when the counterfactual is generated. In the example below, we generate counterfactuals for the first 10th instances of the test set with a target class `setosa` defined in the `target_label` variable. .. code-block:: python # we take the first 10-th examples small_data = data[:10, :] # we get the code for a target class target_label = torch.tensor( dataset.class_encoding(np.repeat("setosa", 10)), dtype=torch.float32 ) # finally, we apply the generation of counterfactuals for class 'setosa' as target cf, clcf = vcnet.counterfactuals(small_data, target_label) print(clcf)