VCNet examples ======================= In this page, we illustrate how to use :py:class:`VCNet`, the joint-learning model that combines a classifier and a counterfactual generator. We start by importing the required classes from our library. You will need to use the dataset and model classes: .. code-block:: python from vcnet import DataCatalog, VCNet Preparing the dataset ----------------------- In this example, we will use the classical `Adult` dataset, which contains both numerical and categorical features. The code to prepare the dataset is as follows: .. code-block:: python import pandas as pd df = pd.read_csv("datasets/adult.data") dataset_settings = { "target":"income", "continuous" : ["age","hours-per-week"], "categorical": ["workclass","education","marital-status","occupation","race","gender"], "immutables" : ["race","gender"], "batch_size": 64, "scaling_method": "MinMax", "encoding_method": "OneHot_drop_binary", "activate_rounding": True, } dataset = DataCatalog(hp['dataset']) This dataset can be initially loaded in a pandas dataframe, then the dataset of given to a :py:class:`DataCatalog` instance, named :py:`dataset`. This dataset can be seen is a kind of Lightning dataset, and to create this dataset that will be used to train the model, we need providing some information to it: * `target`: this is the attribute name to predict by the classifier * `batch_size`: this is classical parameter for optimization based machine learning methods * `scaling_method`: is an optional parameter to scale the numerical attributes * `test_size` / `val_size` (in [0,1], default 0.33) can be used to define the proportion of the dataset used for test/validation * `stratify` (default `False`) specifies whether the test/validation sets have to be sampled with class balance .. warning:: The `target` attribute has to be categorial (but not numerical). In addition, there are parameters that are specific to VCNet: * `continuous` and `categorical` defines the features to use (and which type of attribute they are) * `imputation_method` define a method to use to make imputation of missing values (subclass of :py:module:`sklearn.impute` ) * `encoding_method`: The attributes that are declared as `categorial` will be encoded using this method (by default, it is a one-hot encoding) * `immutable`: List here the attributes that you want to see unchanged while generating counterfactuals. * `activate_rounding`: If `True` (default), it activates the rounding of numerical attributes as a post-processing of counterfactual generation. This makes the counterfactuals more realistic. .. note:: It is worth noticing that only the features that are listed in `continuous` or `categorical` will be handled by the model. Other attributes will be ignored. We now prepare the datasets for training. .. code-block:: python dataset_settings = dataset.prepare_data(df) train_loader = dataset.train_dataloader() test_loader = dataset.test_dataloader() The data preparation transforms the pandas dataframe into a dataset compatible with the VCNet. The settings that we provided earlier are used at these stages, and are enriched through the different steps of the data preparation. This is why, we get an update version of the `dataset_settings` to be used later by the classifier. Then, we collect the two datasets (for training and testing) into Lightning loaders. This loaders will be used by the VCNet module. Training the model -------------------- Now that our dataset is prepared, it is time to define and fit the model. Let first define the model ... and again, we have to set up a collection of hyperparameters. Note that part of the hyper-parameters of the models are directly the ones of the dataset. The other parameters are related to the architecture of VCNet. We invite you to have a look at the original article of VCNet to get their insights. In short: * `lambda_...` are loss weights * `latent_size`, `latent_size_share` etc. are architecture hyper parameters (layer sizes) * `epochs` and `lr` (learning rate) are the classical optimization parameters .. code-block:: python hp = { "dataset": dataset_settings, "vcnet_params" : { "lr": 2e-3, "epochs" : 5, "lambda_KLD": 0.5, "lambda_CE": 0.93, "lambda_BCE": 1, "latent_size" : 19, "latent_size_share" : 304, "mid_reduce_size" : 152 } } vcnet = VCNet(hp) Let us now fit the model that we have defined on the train dataset. For that, we use the Lightning trainer. Then, it can be done in two lines: .. code-block:: python import lightning as L trainer = L.Trainer(max_epochs=hp['vcnet_params']['epochs']) trainer.fit(model=vcnet, train_dataloaders=train_loader) That's it! You trained both the classifier and the counterfactual generator! For more advanced training, we provide tools for a *monotonic annealing* strategy, which improves learning stability by gradually increasing the weight of a specific loss term over epochs: .. code-block:: python import lightning as L from vcnet.monotonic_annealing import MonotonicAnnealing trainer = L.Trainer(max_epochs=hp['vcnet_params']['epochs']) trainer.fit(model=vcnet, train_dataloaders=train_loader, callbacks=[ MonotonicAnnealing(0.1,2)]) The monotonic annealing parameters can also be included in the global settings for centralized configuration. Use your model -------------------- At this stage, you should evaluate the model’s accuracy and the validity of the counterfactual generator. The following code generates counterfactuals from the test set and assesses their accuracy and validity. It extracts examples from `test_loader`, applies `forward_pred` to obtain probabilistic predictions, and generates counterfactuals using the counterfactuals method. .. code-block:: python vcnet.eval() for data, labels in test_loader: cl = vcnet.forward_pred(data) cl = (cl.squeeze()>0.5).float() cf, clcf = vcnet.counterfactuals(data) acc = torch.sum( cl == labels)/len(data) validity = torch.sum( cl != clcf)/len(data) print(f"Accuracy: {acc}, validity:{validity}") It works??? Great!! Now, you can apply the fitted model to new examples! The following code illustrates how to generate counterfactuals in practice. Again, we use the test set here, but it can be any dataset with the same `dataset_settings`: .. code-block:: python for data, labels in test_loader: cf, clcf = vcnet.counterfactuals(data) cfdf = dataset.data_unloader(cf,clcf) print(cfdf) The key difference from the previous example is the use of `data_unloader`, which post-processes VCNet's internal counterfactuals to make them user-readable and more realistic. This step reverses preprocessing (scaling and one-hot encoding) and applies smart rounding (if activated). As a result, the generated counterfactuals `cfdf` look just like examples from the original dataset. More specifically, categorical features are reconstructed, and immutable attributes ('race' and 'gender' in the example) remain unchanged in the counterfactuals.