Benchmarking Robustness
========================

Robustness is also a critical aspect of evaluating network alignment (NA) algorithms, especially when working with **noisy real-world data**.
This tutorial shows how to benchmark the **robustness** of a network alignment (NA) algorithm in ``PlanetAlign`` against various types of noise that can occur in practice, such as:

- Edge noise (randomly added or removed edges)
- Attribute noise (corrupted node features)
- Supervision noise (incorrect anchor links)

In this tutorial, we test the **robustness** of NA algorithms by injecting controlled noise into:

1. Graph structure (edge noise)
2. Node features (attribute noise)
3. Supervision (supervision noise)

``PlanetAlign`` provides utility functions in ``PlanetAlign.utils`` compatible with the built-in datasets and custom datasets derived from :class:`BaseData`
to facilitate such perturbations and evaluate model performance under varying levels of noise.

.. contents::
   :local:
   :depth: 2

Adding Edge Noise
-----------------

To simulate structural perturbation, you can add random edges to one or both graphs using

.. autofunction:: PlanetAlign.utils.add_edge_noises

Here is an example that perturbs ``G1`` and ``G2`` with 10% edge noise:

.. code-block:: python

    from PlanetAlign.datasets import Douban
    from PlanetAlign.utils import add_edge_noises

    data = Douban()
    
    # Add edge noise to both graphs with noise_rate = 0.1
    data = add_edge_noises(data, noise_rate=0.1, gids=[0, 1], inplace=False)

    # Check the new number of edges
    print("G1 edges (after noise):", data.pyg_graphs[0].num_edges)
    print("G2 edges (after noise):", data.pyg_graphs[1].num_edges)

.. note::

    - ``noise_rate`` ∈ [0, 1] determines the fraction of new random edges added.
    - ``gids`` specifies which graphs (by index) to perturb (e.g., `[0]`, `[1]`, or `[0, 1]`)
    - ``inplace`` argument is defaulted to ``False``, meaning it returns a new dataset object with noise applied. Set ``inplace=True`` to modify the original dataset directly.

Adding Attribute Noise
-----------------------

To test the model's robustness to noisy node features, you can use:

.. autofunction:: PlanetAlign.utils.add_attr_noises

This randomly corrupts a proportion of feature vectors by replacing them with noise (e.g., Gaussian or uniform).

.. code-block:: python

    from PlanetAlign.datasets import Douban
    from PlanetAlign.utils import add_attr_noises

    data = Douban()

    # Add 20% attribute noise to both graphs by fliping binary attributes
    data = add_attr_noises(data, mode='flip', noise_rate=0.2, gids=[0, 1], inplace=False)

    # Check that node features are still the same shape
    print("X1 shape:", data.X1.shape)
    print("X2 shape:", data.X2.shape)

.. note::

    - Only applies if ``X1`` and/or ``X2`` are present.
    - For binary features, use ``mode='flip'`` to randomly flip bits.
    - For continuous features, use ``mode='gaussian'`` to add Gaussian noise.

Adding Supervision Noise
-------------------------

To simulate incorrect alignment supervision (e.g., noisy anchors), use:

.. code-block:: python

    # Add supervision noise to graphs in a PyNetAlign dataset by injecting noisy anchors.
    add_sup_noises(dataset, noise_ratio, src_gid=0, dst_gid=1, seed=None, inplace=False)

**Parameters**

- **dataset** (*Dataset*) – The input dataset containing the graphs and ground-truth alignment.
- **noise_ratio** (*float*) – The ratio of supervision to perturb (value between 0 and 1).
- **src_gid** (*int*, *optional*) – The graph ID of the source graph. Default is ``0``.
- **dst_gid** (*int*, *optional*) – The graph ID of the destination graph. Default is ``1``.
- **seed** (*int*, *optional*) – Random seed for reproducibility.
- **inplace** (*bool*, *optional*) – If ``True``, modify the dataset in place. Otherwise, return a new dataset with supervision noise applied.

**Returns**

- **Dataset** – A PyG dataset with perturbed supervision (modified anchors).

**Return type**

- :class:`Dataset`

This randomly replaces a proportion of training anchor pairs with mismatched ones:

.. code-block:: python

    from PlanetAlign.datasets import Douban
    from PlanetAlign.utils import add_sup_noises

    data = Douban()

    # Add 30% supervision noise into anchor links for training
    data = add_sup_noises(data, noise_rate=0.3, inplace=False)

.. note::

    - ``noise_rate`` controls the fraction of anchor pairs corrupted.
    - This helps test how models perform when supervision is imperfect or partially mislabeled.

Best Practices
--------------

- Run multiple trials per noise level to reduce variance in evaluation.
- Plot metric degradation (e.g., MRR, Hits@K) versus noise rate to analyze robustness curves.
- Vary only one noise source at a time (e.g., edge vs. attribute) to isolate effects.

Summary
-------

In this tutorial, we showed how to:

- Inject **edge noise** using :func:`add_edge_noises`
- Add **attribute noise** using :func:`add_attr_noises`
- Simulate **supervision noise** using :func:`add_sup_noises`

These utilities allow you to evaluate how sensitive NA algorithms are to real-world imperfections. Next, consider benchmarking robustness across multiple datasets or noise regimes for more comprehensive analysis.