Skip to content

model_tests.FEAT.SubgroupDisparity

SubgroupDisparity Objects

@dataclass
class SubgroupDisparity(ModelTest)

Test if the maximum difference / ratio of a specified metric for any 2 groups within a specified protected attribute exceeds the given threshold.

If chi2 is used, the p-value calculated from a chi-square test of independence should be greater than the level of significance as specified by the threshold.

Arguments:

  • attr - Column name of the protected attribute.
  • metric - Type of performance metric for the test, choose from 'fpr' - false positive rate, 'fnr' - false negative rate, 'pr' - positive rate.
  • method - Type of method for the test, choose from 'chi2', 'ratio' or 'diff'.
  • threshold - Threshold for maximum difference / ratio, or the significance level of chi-sq test.
  • test_name - Name of the test, default is 'Subgroup Disparity Test'.
  • test_desc - Description of the test. If none is provided, an automatic description will be generated based on the rest of the arguments passed in.

get_metric_dict

def get_metric_dict(df: pd.DataFrame) -> Tuple[dict, list]

Calculate metric ratio / difference and size for each subgroup of the protected attribute on a given df.

Arguments:

  • df - Dataframe.

Returns:

A dictionary of each subgroup and the calculated ratio or difference.

get_contingency_table

def get_contingency_table(df: pd.DataFrame) -> list

Obtain the contingency table of the metric of interest for each subgroup of a protected attribute on a given df.

Arguments:

  • df - Dataframe.

Returns:

List of metric value.

plot

def plot(alpha: float = 0.05, save_plots: bool = True)

Plot the metric of interest across the attribute subgroups, and their confidence interval bands.

Arguments:

  • alpha - Significance level for confidence interval. Calculated based on the binomial proportion approximation formula.
  • save_plots - If True, saves the plots to the class instance.

get_result

def get_result(df_test_with_output: pd.DataFrame) -> Dict[str, float]

Calculate maximum ratio / diff or chi-sq test for any 2 subgroups on a given df.

Arguments:

  • df_test_with_output - Dataframe containing protected attributes with "prediction" and "truth" column.

run

def run(df_test_with_output: pd.DataFrame) -> bool

Runs test by calculating result and evaluating if it passes a defined condition.

Arguments:

  • df_test_with_output - Dataframe containing protected attributes with "prediction_probas" and "truth" column. protected attribute should not be encoded.