EquiBoots Class
- class equiboots.EquiBootsClass.EquiBoots(y_true: numpy.array, y_pred: numpy.array, fairness_df: pandas.DataFrame, fairness_vars: list, y_prob: numpy.array = None, seeds: list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], reference_groups: list = None, task: str = 'binary_classification', bootstrap_flag: bool = False, num_bootstraps: int = 10, boot_sample_size: int = 100, balanced: bool = True, stratify_by_outcome: bool = False, group_min_size: int = 10)[source]
Bases:
object
- calculate_differences(metric_dict, ref_var_name: str) dict [source]
Calculate difference metrics for each group based on the task type.
- calculate_disparities(metric_dict, var_name: str) dict [source]
Calculate disparities metrics for each group based on the task type.
- calculate_groups_differences(metric_dict: dict, ref_var_name: str) dict [source]
Calculate differences between each group and the reference group.
- calculate_groups_disparities(metric_dict: dict, var_name: str) dict [source]
Calculate disparities between each group and the reference group.
- check_group_empty(sampled_group: numpy.array, cat: str, var: str) bool [source]
Check if sampled group is empty.
- check_group_size(group: pandas.Index, cat: str, var: str) bool [source]
Check if a group meets the minimum size requirement.
- get_groups_metrics(sliced_dict: dict) dict [source]
Calculate metrics for each group based on the task type.
- static list_adjustment_methods() Dict[str, str] [source]
List available adjustment methods and their descriptions.
- static list_available_tests() Dict[str, str] [source]
List available statistical tests and their descriptions.
Overview
The EquiBoots
class provides tools for fairness-aware evaluation and bootstrapping of machine learning model predictions. It supports binary, multi-class, multi-label classification, and regression tasks, and enables group-based metric calculation, disparity analysis, and statistical significance testing.
Constructor
- class equiboots.EquiBootsClass.EquiBoots(y_true, y_pred, fairness_df, fairness_vars, y_prob=None, seeds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], reference_groups=None, task='binary_classification', bootstrap_flag=False, num_bootstraps=10, boot_sample_size=100, balanced=True, stratify_by_outcome=False, group_min_size=10)[source]
Initialize a new
EquiBoots
instance.Parameters
y_true (numpy.ndarray) Ground truth labels.
y_pred (numpy.ndarray) Predicted labels.
fairness_df (pandas.DataFrame) DataFrame containing fairness variables.
fairness_vars (list of str) Names of fairness variables.
y_prob (numpy.ndarray, optional) Predicted probabilities.
seeds (list of int, optional) Random seeds for bootstrapping.
reference_groups (list, optional) Reference group for each fairness variable.
task (str) One of
binary_classification
,multi_class_classification
,multi_label_classification
, orregression
.bootstrap_flag (bool) Whether to perform bootstrapping.
num_bootstraps (int) Number of bootstrap iterations.
boot_sample_size (int) Size of each bootstrap sample.
balanced (bool) If
True
, balance samples across groups; otherwise stratify by original proportions.stratify_by_outcome (bool) Stratify sampling by outcome label.
group_min_size (int) Minimum group size (groups smaller than this are omitted).
Returns
None
Main Methods
- equiboots.EquiBootsClass.grouper(groupings_vars)
Groups data by the specified fairness variables and stores category indices.
Parameters
groupings_vars (list of str) Variables to group by.
Returns
None
- equiboots.EquiBootsClass.slicer(slicing_var)
Slice
y_true
,y_prob
, andy_pred
by a single fairness variable.Parameters
slicing_var (str) Variable name to slice by.
Returns
- dict or list of dict
Sliced outputs.
- equiboots.EquiBootsClass.get_metrics(sliced_dict)
Calculate performance metrics for each group.
Parameters
sliced_dict (dict or list of dict) Output of
slicer
.
Returns
- dict or list of dict
Metrics per group.
- equiboots.EquiBootsClass.calculate_disparities(metric_dict, var_name)
Compute ratio disparities against the reference group.
Parameters
metric_dict (dict or list of dict) Group metrics.
var_name (str) Fairness variable name.
Returns
- dict or list of dict
Ratio disparities.
- equiboots.EquiBootsClass.calculate_differences(metric_dict, ref_var_name)
Compute difference disparities against the reference group.
Parameters
metric_dict (dict or list of dict) Group metrics.
ref_var_name (str) Reference group name.
Returns
- dict or list of dict
Difference disparities.
- equiboots.EquiBootsClass.analyze_statistical_significance(metric_dict, var_name, test_config, differences=None)
Perform significance testing on metric differences.
Parameters
metric_dict (dict or list of dict) Group metrics.
var_name (str) Fairness variable name.
test_config (dict) Statistical test configuration.
differences (dict, optional) Precomputed differences.
Returns
- dict
Statistical test results per group.
- equiboots.EquiBootsClass.set_fix_seeds(seeds)
Set fixed random seeds for reproducibility.
Parameters
seeds (list of int) Seeds to apply.
Returns
None
- equiboots.EquiBootsClass.list_available_tests()
List the available statistical tests.
Returns
- dict
Test names and descriptions.
- equiboots.EquiBootsClass.list_adjustment_methods()
List the available p-value adjustment methods.
Returns
- dict
Adjustment methods.
Non-Main/Internal Methods
- equiboots.EquiBootsClass.set_reference_groups(reference_groups)
Set or infer reference groups for fairness variables.
Parameters
reference_groups (list) Reference groups to use.
Returns
None
- equiboots.EquiBootsClass.check_task(task)
Validate the task type.
Parameters
task (str) Task name.
Returns
None
- equiboots.EquiBootsClass.check_classification_task(task)
Ensure the task is a classification type.
Parameters
task (str) Task name.
Returns
None
- equiboots.EquiBootsClass.check_fairness_vars(fairness_vars)
Validate the fairness variables input.
Parameters
fairness_vars (list of str) Variables to validate.
Returns
None
- equiboots.EquiBootsClass.check_group_size(group, cat, var)
Verify minimum size for a group.
Parameters
group Group data.
cat Category name.
var Variable name.
Returns
None
- equiboots.EquiBootsClass.check_group_empty(sampled_group, cat, var)
Check if a sampled group is empty.
Parameters
sampled_group Group data.
cat Category name.
var Variable name.
Returns
None
- equiboots.EquiBootsClass.sample_group(group, n_categories, indx, sample_size, seeds, balanced)
Draw bootstrap or stratified samples.
Parameters
group Group data.
n_categories (int) Number of categories.
indx Indices of data.
sample_size (int) Bootstrap sample size.
seeds (list of int) Random seeds.
balanced (bool) Balance flag.
Returns
The sampled group data.
- equiboots.EquiBootsClass.groups_slicer(groups, slicing_var)
Slice data into categories for a given variable.
Parameters
groups Group index mapping.
slicing_var (str) Variable name.
Returns
- dict or list of dict
Sliced data.
- equiboots.EquiBootsClass.get_groups_metrics(sliced_dict)
Calculate metrics for each group.
Parameters
sliced_dict (dict or list of dict) Sliced data.
Returns
- dict or list of dict
Metrics per group.
- equiboots.EquiBootsClass.calculate_groups_disparities(metric_dict, var_name)
Compute ratio disparities for each group.
Parameters
metric_dict (dict or list of dict) Group metrics.
var_name (str) Fairness variable name.
Returns
- dict or list of dict
Ratio disparities.
- equiboots.EquiBootsClass.calculate_groups_differences(metric_dict, ref_var_name)
Compute difference disparities for each group.
Parameters
metric_dict (dict or list of dict) Group metrics.
ref_var_name (str) Reference group name.
Returns
- dict or list of dict
Difference disparities.
Example Usage
Below are two dummy examples demonstrating how to use the EquiBoots
class: one without bootstrapping and one with bootstrapping.
For more detailed examples, refer to that Colab notebook or py_scripts/testingscript.py
.
Point Estimates Without Bootstrapping
import numpy as np
import pandas as pd
from equiboots import EquiBoots
# Example data
y_true = np.array([0, 1, 1, 0, 1])
y_prob = np.array([0.2, 0.8, 0.7, 0.4, 0.9])
y_pred = np.array([0, 1, 1, 0, 1])
fairness_df = pd.DataFrame({
"race": ["A", "B", "A", "B", "A"],
"sex": ["M", "F", "F", "M", "F"]
})
eq = EquiBoots(
y_true=y_true,
y_prob=y_prob,
y_pred=y_pred,
fairness_df=fairness_df,
fairness_vars=["race", "sex"],
task="binary_classification",
bootstrap_flag=False
)
eq.grouper(groupings_vars=["race"])
sliced = eq.slicer("race")
metrics = eq.get_metrics(sliced)
disparities = eq.calculate_disparities(metrics, "race")
print("Metrics by group:", metrics)
print("Disparities:", disparities)
With Bootstrapping
import numpy as np
import pandas as pd
from equiboots import EquiBoots
# Example data
y_true = np.array([0, 1, 1, 0, 1])
y_prob = np.array([0.2, 0.8, 0.7, 0.4, 0.9])
y_pred = np.array([0, 1, 1, 0, 1])
fairness_df = pd.DataFrame({
"race": ["A", "B", "A", "B", "A"],
"sex": ["M", "F", "F", "M", "F"]
})
eq = EquiBoots(
y_true=y_true,
y_prob=y_prob,
y_pred=y_pred,
fairness_df=fairness_df,
fairness_vars=["race", "sex"],
task="binary_classification",
bootstrap_flag=True,
num_bootstraps=5,
boot_sample_size=5
)
eq.grouper(groupings_vars=["race"])
sliced = eq.slicer("race")
metrics = eq.get_metrics(sliced)
disparities = eq.calculate_disparities(metrics, "race")
print("Metrics by group (bootstrapped):", metrics)
print("Disparities (bootstrapped):", disparities)
StatisticalTester
Module: equiboots.StatisticalTester
Overview
This module provides statistical significance testing utilities, including bootstrapped and chi-square tests, with support for multiple comparison corrections and effect size calculations.
Classes
StatTestResult
- class equiboots.StatisticalTester.StatTestResult(statistic: float, p_value: float, is_significant: bool, test_name: str, critical_value: float | None = None, effect_size: float | None = None, confidence_interval: Tuple[float, float] | None = None)[source]
Bases:
object
Stores statistical test results including test statistic, p-value, and significance.
StatisticalTester
- class equiboots.StatisticalTester.StatisticalTester[source]
Bases:
object
Performs statistical significance testing on metrics with support for various tests and data types.
- adjusting_p_vals(config, results)[source]
Runs the adjusting p value method based on bootstrap conditions
- analyze_metrics(metrics_data: Dict | List[Dict], reference_group: str, test_config: Dict[str, Any], task: str | None = None, differences: dict | None = None) Dict[str, Dict[str, StatTestResult]] [source]
Analyzes metrics for statistical significance against a reference group.
Function Signatures
- class equiboots.EquiBootsClass.StatTestResult(statistic: float, p_value: float, is_significant: bool, test_name: str, critical_value: float | None = None, effect_size: float | None = None, confidence_interval: Tuple[float, float] | None = None)[source]
Stores statistical test results including test statistic, p-value, and significance.
- class equiboots.EquiBootsClass.StatisticalTester[source]
Performs statistical significance testing on metrics with support for various tests and data types.
- _bootstrap_test(data: List[float], config: dict) StatTestResult [source]
- _chi_square_test(metrics: Dict[str, Any], config: Dict[str, Any]) StatTestResult [source]
- _adjust_p_values(results: Dict[str, Dict[str, StatTestResult]], method: str, alpha: float, boot: bool = False) Dict[str, Dict[str, StatTestResult]] [source]
- analyze_metrics(metrics_data: Dict | List[Dict], reference_group: str, test_config: Dict[str, Any], task: str | None = None, differences: dict | None = None) Dict[str, Dict[str, StatTestResult]] [source]
Usage Example
from equiboots.StatisticalTester import StatisticalTester
tester = StatisticalTester()
config = {
"test_type": "chi_square",
"alpha": 0.05,
"adjust_method": "bonferroni",
}
metrics = {
"group1": {"TP": 10, "FP": 5, "TN": 20, "FN": 2},
"group2": {"TP": 8, "FP": 7, "TN": 18, "FN": 4},
}
results = tester.analyze_metrics(
metrics,
reference_group="group1",
test_config=config,
task="binary_classification",
)
for group, result in results.items():
print(f"{group}: p-value={result.p_value}, significant={result.is_significant}")