Metrics and Evaluation

Assuming we have the ground truth labels (necessary for supervised learning) we can use multiple evaluation metrics. In general they can be grouped into 3 categories based on what they compare:

  • Image similarity

  • Displacement field

  • Segmentation

Evaluation metrics

All these metrics are implemented in atlalign.metrics. A subset of them is also available as drop-in losses for deep learning in atlalign.ml_utils.losses.

The following sub-sections list the available metrics in each of the three categories. They are all part of the atlalign.metrics module and have a common interface:

atlalign.metrics.<metric_name>(y_true, y_pred, **kwargs)

The parameters y_true and y_pred are pairs of images, displacement fields, or segmentation maps. Multiple pairs of images can be processed at once by stacking them along the first dimension, so that y_true and y_pred have the shape (n_images, ...).

Some metrics have optional keyword arguments that differ from metric to metric, the API reference for more details.

Image Similarity Metrics

Loss-like (the smaller the better):

  • mse_img – mean squared error

  • mae_img – mean absolute error

  • demons_img – ANTsPy’s demons metric

  • perceptual_loss_img – perceptual loss

Similarity-like (the higher the better):

  • psnr_img – peak signal to noise ratio (max = infinity)

  • cross_correlation_img – image cross correlation (max = 1)

  • ssmi_img – structural similarity (max = 1)

  • mi_img – mutual information (max = mutual information with itself)

Displacement Field Metrics

  • correlation_combined – combined version of correlation

  • mae_combined – combined version of mean absolute error

  • mse_combined – combined version of mean squared error

  • r2_combined – combined version of r2

  • vector_distance_combined – combined version of vector distance

Segmentation Metrics

  • iou_score – intersection over union (between 0 and 1, the higher the better)

  • dice_score – dice score (between 0 and 1, the higher the better)

Compute Many Metrics at Once

To get a comprehensive overview of how specific model performs, we implemented a utility function atlalign.metrics.evaluate that computes multiple metrics at the same time and returns the results in a pandas.DataFrame.

import numpy as np

from atlalign.metrics import evaluate

n_samples = 5
shape = (320, 456)

y_true = np.random.randint(0, 20, size=(n_samples, *shape, 2))
y_pred = np.random.randint(0, 20, size=(n_samples, *shape, 2))

imgs_mov = np.random.random((n_samples, *shape))
img_ids = np.array(range(n_samples))
dataset_ids = np.array(range(n_samples))
ps = np.linspace(0, 12200, num=n_samples).astype('int')

_, res_df = evaluate(y_true,
                     depths=(1, 2, 3, 4, 5))

Index(['angular_error_a', 'cross_correlation_img_a', 'dataset_id',
       'iou_depth_1', 'iou_depth_2', 'iou_depth_3', 'iou_depth_4',
       'iou_depth_5', 'jacobian_nonpositive_pixels_a',
       'jacobian_nonpositive_pixels_perc_a', 'mae_img_a', 'mi_img_a',
       'mse_img_a', 'norm_a', 'p', 'psnr_img_a', 'ssmi_img_a',