This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Achievable Fairness on Your Data With Utility Guarantees

Muhammad Faaiz Taufiq
ByteDance Research
faaiz.taufiq@bytedance.com
&Jean-François Ton
ByteDance Research
jeanfrancois@bytedance.com
&Yang Liu
University of California Santa Cruz
yangliu@ucsc.edu
Corresponding authors: faaiz.taufiq@bytedance.com and jeanfrancois@bytedance.com
Abstract

In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.

1 Introduction

A key challenge in fairness for machine learning is to train models that minimize disparity across various sensitive groups such as race or gender [9, 35, 10]. This often comes at the cost of reduced model accuracy, a phenomenon termed accuracy-fairness trade-off [36, 32]. This trade-off can differ significantly across datasets, depending on factors such as dataset biases, imbalances etc. [1, 8, 11].

To demonstrate how these trade-offs are inherently dataset-dependent, we consider a simple example involving two distinct crime datasets. Dataset A has records from a community where crime rates are uniformly distributed across all racial groups, whereas Dataset B comes from a community where historical factors have resulted in a disproportionate crime rate among a specific racial group. Intuitively, training models which are racially agnostic is more challenging for Dataset B, due to the unequal distribution of crime rates across racial groups, and will result in a greater loss in model accuracy as compared to Dataset A.

This example underscores that setting a uniform fairness requirement across diverse datasets (such as requiring the fairness violation metric to be below 10% for both datasets), while also adhering to essential accuracy benchmarks is impractical. Therefore, choosing fairness guidelines for any dataset necessitates careful consideration of its individual characteristics and underlying biases. In this work, we advocate against the use of one-size-fits-all fairness mandates by proposing a nuanced, dataset-specific framework for quantifying acceptable range of accuracy-fairness trade-offs. To put it concretely, the question we consider is:

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Accuracy-fairness trade-offs for COMPAS dataset (on held-out data). The black and red curves are obtained using the same optimally trained model evaluated on different splits. The blue curve is obtained using a suboptimally trained model. The green area depicts the range of permissible fairness violations for each accuracy, pink area shows suboptimal accuracy-fairness trade-offs, and blue area shows unlikely-to-be-achieved ones. (Details in Appendix F.5)

Given a dataset, what is the range of permissible fairness violations corresponding to each accuracy threshold for models in a given class \mathcal{H}?

This question can be addressed by considering the optimum accuracy-fairness trade-off, which shows the minimum fairness violation achievable for each level of accuracy. Unfortunately, this curve is typically unavailable and hence, various optimization techniques have been proposed to approximate this curve, ranging from regularization [8, 33] to adversarial learning [45, 40].

However, approximating the trade-off curve using these aforementioned methods has some serious limitations. Firstly, these methods require retraining hundreds if not thousands of models to obtain a good approximation of the trade-off curve, making them computationally infeasible for large datasets or models. Secondly, these works do not account for finite-sampling errors in the obtained curve. This is problematic since the empirical trade-off evaluated over a finite dataset may not match the exact trade-off over the full data distribution.

We illustrate this phenomenon in Figure 1 where the black and red trade-off curves are obtained using the same model but evaluated over two different test data draws. Here, relying solely on the estimated curves without accounting for the uncertainty could lead us to make the incorrect conclusion that the methodology used to obtain the black trade-off curve is sub-optimal (compared to the red curve) as it achieves a higher fairness violation for accuracies in the range [0.62,0.66][0.62,0.66]. However, this discrepancy arises solely due to finite-sampling errors.

In this paper, we address these challenges by introducing a computationally efficient method of approximating the optimal accuracy-fairness trade-off curve, supported by rigorous statistical guarantees. Our methodology not only circumvents the need to train multiple models (leading to at least a 10-fold reduction in computational cost) but is also the first to quantify the uncertainty in the estimated curve, arising from both, finite-sampling error as well as estimation error. To achieve this, our approach adopts a novel probabilistic perspective and provides guarantees that remain valid across all finite-sample draws. This also allows practitioners to distinguish if an apparent suboptimality in a baseline could be explained by finite-sampling errors (as in the black curve in Figure 1), or if it stems from genuine deficiencies in the fairness interventions applied (as in the blue curve).

The contributions of this paper are three-fold:

  • We present a computationally efficient methodology for approximating the accuracy-fairness trade-off curve by training only a single model. This is achieved by adapting a technique from [15] called You-Only-Train-Once (YOTO) to the fairness setting.

  • To account for the approximation and finite-sampling errors, we introduce a novel technical framework to construct confidence intervals (using the trained YOTO model) which contain the optimal accuracy-fairness trade-off curve with statistical guarantees. For any accuracy threshold ψ\psi chosen at inference time, this gives us a statistically backed range of permissible fairness violations [l(ψ),u(ψ)][l(\psi),u(\psi)], allowing us to answer our previously posed question:

    Given a dataset, the permissible range of fairness violations corresponding to an accuracy threshold of ψ\psi is [l(ψ),u(ψ)][l(\psi),u(\psi)] for models in a given class \mathcal{H}.

  • Lastly, we showcase the vast applicability of our method empirically across various data modalities including tabular, image and text datasets. We evaluate our framework on a suite of SOTA fairness methods and show that our intervals are both reliable and informative.

2 Preliminaries

Notation

Throughout this paper, we consider a binary classification task, where each training sample is composed of triples, (X,A,Y)(X,A,Y). X𝒳X\in\mathcal{X} denotes a vector of features, A𝒜A\in\mathcal{A} indicates a discrete sensitive attribute, and Y𝒴{0,1}Y\in\mathcal{Y}\coloneqq\{0,1\} represents a label. To make this more concrete, if we take loan default prediction as the classification task, XX represents individuals’ features such as their income level and loan amount; AA represents their racial identity; and YY represents their loan default status. Having established the notation, for completeness, we provide some commonly used fairness violations Φfair(h)[0,1]\Phi_{\textup{fair}}(h)\in[0,1] for a classifier model h:𝒳𝒴h:\mathcal{X}\rightarrow\mathcal{Y} when 𝒜={0,1}\mathcal{A}=\{0,1\}:

Demographic Parity (DP)

The DP condition states that the selection rates for all sensitive groups are equal, i.e. (h(X)=1A=a)=(h(X)=1)\mathbb{P}(h(X)=1\mid A=a)=\mathbb{P}(h(X)=1) for any a𝒜a\in\mathcal{A}. The absolute DP violation is:

ΦDP(h)|(h(X)=1A=1)(h(X)=1A=0)|.\displaystyle\Phi_{\textup{DP}}(h)\coloneqq|\mathbb{P}(h(X)=1\mid A=1)-\mathbb{P}(h(X)=1\mid A=0)|.
Equalized Opportunity (EOP)

The EOP condition states that the true positive rates for all sensitive groups are equal, i.e. (h(X)=1A=a,Y=1)=(h(X)=1Y=1)\mathbb{P}(h(X)=1\mid A=a,Y=1)=\mathbb{P}(h(X)=1\mid Y=1) for any a𝒜a\in\mathcal{A}. The absolute EOP violation is:

ΦEOP(h)\displaystyle\Phi_{\textup{EOP}}(h)\coloneqq |(h(X)=1A=1,Y=1)(h(X)=1A=0,Y=1)|.\displaystyle|\mathbb{P}(h(X)=1\mid A=1,Y=1)-\mathbb{P}(h(X)=1\mid A=0,Y=1)|.

2.1 Problem setup

Next, we formalise the notion of accuracy-fairness trade-off, which is the main quantity of interest in our work. For a model class \mathcal{H} (e.g., neural networks) and a given accuracy threshold ψ[0,1]\psi\in[0,1], we define the optimal accuracy-fairness trade-off τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) as,

τfair(ψ)minhΦfair(h)subject toacc(h)ψ.\displaystyle\tau^{\ast}_{\textup{fair}}(\psi)\coloneqq\min_{h\in\mathcal{H}}\Phi_{\textup{fair}}(h)\quad\textup{subject to}\quad\textup{acc}(h)\geq\psi. (1)

Here, Φfair(h)\Phi_{\textup{fair}}(h) and acc(h)\textup{acc}(h) denote the fairness violation and accuracy of hh over the full data distribution. For an accuracy ψ\psi^{\prime} which is unattainable, we define τfair(ψ)=1\tau^{\ast}_{\textup{fair}}(\psi^{\prime})=1 and we focus on models \mathcal{H} trained using gradient-based methods. Crucially, our goal is not to estimate the trade-off at a fixed accuracy level, but instead to reliably and efficiently estimate the entire trade-off curve τfair\tau^{\ast}_{\textup{fair}}. In contrast, previous works [1, 11] impose an apriori fairness constraint during training and therefore each trained model only recovers one point on the trade-off curve corresponding to this pre-specified constraint.

If available, this trade-off curve would allow practitioners to characterise exactly how, for a given dataset, the minimum fairness violation varies as model accuracy increases. This not only provides a principled way of selecting data-specific fairness requirements, but also serves as a tool to audit if a model meets acceptable fairness standards by checking if its accuracy-fairness trade-off lies on this curve. Nevertheless, obtaining this ground-truth trade-off curve exactly is impossible within the confines of a finite-sample regime (owing to finite-sampling errors). This means that even if baseline A’s empirical trade-off evaluated on a finite dataset is suboptimal compared to the empirical trade-off of baseline B, this does not necessarily imply suboptimality on the full data distribution.

We illustrate this in Figure 1 where both red and black trade-off curves are obtained using the same model but evaluated on different test data splits. Here, even though the black curve appears suboptimal compared to the red curve for accuracies in [0.62,0.66][0.62,0.66], this apparent suboptimality is solely due to finite-sampling errors (since the discrepancy between the two curves arises only due to different evaluation datasets). If we rely only on comparing empirical trade-offs, we would incorrectly flag the methodology used to obtain the black curve as suboptimal.

To address this, we construct confidence intervals (CIs), shown as the green region in Figure 1, that account for such finite-sampling errors. In this case both trade-offs fall within our CIs which correctly indicates that this apparent suboptimality could stem from finite-sample variability. Conversely, a baseline’s trade-off falling above our CIs (as in the blue curve in Figure 1) offers a confident assessment of suboptimality, as this cannot be explained away by finite-sample variability. Therefore, our CIs equip practitioners with a robust auditing tool. They can confidently identify suboptimal baselines while avoiding false conclusions caused by considering empirical trade-offs alone.

High-level road map

To achieve this, our proposed methodology adopts a two-step approach:

  1. 1.

    Firstly, we propose loss-conditional fairness training, a computationally efficient methodology of estimating the entire trade-off curve τfair\tau^{\ast}_{\textup{fair}} by training a single model, obtained by adapting the YOTO framework [15] to the fairness setting.

  2. 2.

    Secondly, to account for the approximation and finite-sampling errors in our estimates, we introduce a novel methodology of constructing confidence intervals on the trade-off curve τfair\tau^{\ast}_{\textup{fair}} using the trained YOTO model. Specifically, given α(0,1)\alpha\in(0,1), we construct confidence intervals Γfairα[0,1]\Gamma_{\textup{fair}}^{\alpha}\subseteq[0,1] which satisfy guarantees of the form:

    (τfair(Ψ)Γfairα)1α.\mathbb{P}(\tau^{\ast}_{\textup{fair}}(\Psi)\in\Gamma_{\textup{fair}}^{\alpha})\geq 1-\alpha.

    Here, Γfairα\Gamma_{\textup{fair}}^{\alpha} and Ψ[0,1]\Psi\in[0,1] are random variables obtained using a held-out calibration dataset 𝒟cal\mathcal{D}_{\textup{cal}} (see Section 3.2) and the probability is taken over different draws of 𝒟cal\mathcal{D}_{\textup{cal}}.

3 Methodology

First, we demonstrate how our 2-step approach offers a practical and statistically sound method for estimating τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi). Figure 1 provides an illustration of our proposed confidence intervals (CIs) Γfairα\Gamma_{\textup{fair}}^{\alpha} and shows how they can be interpreted as a range of ‘permissible’ values of accuracy-fairness trade-offs (the green region). Specifically, if for a classifier h0h_{0}, the accuracy-fairness pair (acc(h0),Φfair(h0))(\textup{acc}(h_{0}),\Phi_{\textup{fair}}(h_{0})) lies above the CIs Γfairα\Gamma_{\textup{fair}}^{\alpha} (i.e., the pink region in Figure 1), then h0h_{0} is likely to be suboptimal in terms of the fairness violation, i.e., there likely exists hh^{\prime}\in\mathcal{H} with acc(h)acc(h0)\textup{acc}(h^{\prime})\geq\textup{acc}(h_{0}) and Φfair(h)Φfair(h0)\Phi_{\textup{fair}}(h^{\prime})\leq\Phi_{\textup{fair}}(h_{0}). On the other hand, it is unlikely for any model hh^{\prime}\in\mathcal{H} to achieve a trade-off below the CIs Γfairα\Gamma_{\textup{fair}}^{\alpha} (the blue region in Figure 1). Next, we outline how to construct such intervals.

3.1 Step 1: Efficient estimation of trade-off curve

The first step of constructing the intervals is to approximate the trade-off curve by recasting the problem into a constrained optimization objective. The optimization problem formulated in Eq. (1) is however, often too complex to solve, because the accuracy acc(h)\textup{acc}(h) and fairness violations Φfair(h)\Phi_{\textup{fair}}(h) are both non-smooth [1]. These constraints make it hard to use standard optimization methods that rely on gradients [25]. To get around this issue, previous works [1, 8] replace the non-smooth constrained optimisation problem with a smooth surrogate loss. Here, we consider parameterized family of classifiers ={hθ:𝒳|θΘ}\mathcal{H}=\{h_{\theta}:\mathcal{X}\rightarrow\mathbb{R}\,|\,\theta\in\Theta\} (such as neural networks) trained using the regularized loss:

λ(θ)=𝔼[lCE(hθ(X),Y)]+λfair(hθ).\displaystyle\mathcal{L}_{\lambda}(\theta)=\mathbb{E}[l_{\textup{CE}}(h_{\theta}(X),Y)]+\lambda\,\mathcal{L}_{\textup{fair}}(h_{\theta}). (2)

where, lCEl_{\textup{CE}} is the cross-entropy loss for the classifier hθh_{\theta} and fair(hθ)\mathcal{L}_{\textup{fair}}(h_{\theta}) is a smooth relaxation of the fairness violation Φfair\Phi_{\textup{fair}} [8, 29]. For example, when the fairness violation is DP, [8] consider

fair(hθ)=𝔼[g(hθ(X))A=1]𝔼[g(hθ(X))A=0],\mathcal{L}_{\textup{fair}}(h_{\theta})=\mathbb{E}[g(h_{\theta}(X))\mid A=1]-\mathbb{E}[g(h_{\theta}(X))\mid A=0],

for different choices of g(x)g(x), including the identity and sigmoid functions. We include more examples of such regularizers in Appendix F.3. The parameter λ\lambda in λ\mathcal{L}_{\lambda} modulates the accuracy-fairness trade-off with lower values of λ\lambda favouring higher accuracy over reduced fairness violation.

Now that we defined the optimization objective, obtaining the trade-off curve becomes straightforward by simply optimizing multiple models over a grid of regularization parameters λ\lambda. However, training multiple models can be computationally expensive, especially when this involves large-scale models (e.g. neural networks). To circumvent this computational challenge, we introduce loss-conditional fairness training obtained by adapting the YOTO framework proposed by [15].

3.1.1 Loss-conditional fairness training

As we describe above, a popular approach for approximating the accuracy-fairness trade-off τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) involves training multiple models hθλh_{\theta^{\ast}_{\lambda}} over a discrete grid of λ\lambda hyperparameters with the regularized loss λ\mathcal{L}_{\lambda}. To avoid the computational overhead of training multiple models, [15] propose ‘You Only Train Once’ (YOTO), a methodology of training one model hθ:𝒳×Λh_{\theta}:\mathcal{X}\times\Lambda\rightarrow\mathbb{R}, which takes λΛ\lambda\in\Lambda\subseteq\mathbb{R} as an additional input using Feature-wise Linear Modulation (FiLM) [34] layers. YOTO is trained such that at inference time hθ(,λ)h_{\theta}(\cdot,\lambda^{\prime}) recovers the classifier obtained by minimising λ\mathcal{L}_{\lambda^{\prime}} in Eq. (2).

Recall that we are interested in minimising the family of losses λ\mathcal{L}_{\lambda}, parameterized by λΛ\lambda\in\Lambda (Eq. (2)). Instead of fixing λ\lambda, YOTO solves an optimisation problem where the parameter λ\lambda is sampled from a distribution PλP_{\lambda}. As a result, during training the model observes many values of λ\lambda and learns to optimise the loss λ\mathcal{L}_{\lambda} for all of them simultaneously. At inference time, the model can be conditioned on a chosen value λ\lambda^{\prime} and recovers the model trained to optimise λ\mathcal{L}_{\lambda^{\prime}}. Hence, once adapted to our setting, the YOTO loss becomes:

argminhθ:𝒳×Λ𝔼λPλ[𝔼[lCE(hθ(X,λ),Y)]+λfair(hθ(,λ))].\displaystyle\operatorname*{arg\,min}_{h_{\theta}:\mathcal{X}\times\Lambda\rightarrow\mathbb{R}}\mathbb{E}_{\lambda\sim P_{\lambda}}\left[\mathbb{E}[l_{\textup{CE}}(h_{\theta}(X,\lambda),Y)]+\lambda\,\mathcal{L}_{\textup{fair}}(h_{\theta}(\cdot,\lambda))\right].

Having trained a YOTO model, the trade-off curve τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) can be approximated by simply plugging in different values of λ\lambda at inference time and thus avoiding additional training. From a theoretical point of view, [15, Proposition 1] proves that under the assumption of large enough model capacity, training the loss-conditional YOTO model performs as well as the separately trained models while only requiring a single model. Although the model capacity assumption might be hard to verify in practice, our experimental section has shown that the trade-off curves estimates τfair(ψ)^\widehat{\tau^{\ast}_{\textup{fair}}(\psi)} obtained using YOTO are consistent with the ones obtained using separately trained models.

It should be noted, as is common in optimization problems, that the estimated trade-off curve τfair(ψ)^\widehat{\tau^{\ast}_{\textup{fair}}(\psi)} may not align precisely with the true trade-off curve τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi). This discrepancy originates from two key factors. Firstly, the limited size of the training and evaluation datasets introduces errors in the estimation of τfair(ψ)^\widehat{\tau^{\ast}_{\textup{fair}}(\psi)}. Secondly, we opt for a computationally tractable loss function instead of the original optimization problem in Eq. (1). This may result in our estimation τfair(ψ)^\widehat{\tau^{\ast}_{\textup{fair}}(\psi)} yielding sub-optimal trade-offs, as can be seen from Figure 1. Therefore, to ensure that our procedure yields statistically sound inferences, we next construct confidence intervals using the YOTO model, designed to contain the true trade-off curve τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) with high probability.

3.2 Step 2: Constructing confidence intervals

As mentioned above, our goal here is to use our trained YOTO model to construct confidence intervals (CIs) for the optimal trade-off curve τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) defined in Eq. (1). Specifically, we assume access to a held-out calibration dataset 𝒟cal{(Xi,Ai,Yi)}i\mathcal{D}_{\textup{cal}}\coloneqq\{(X_{i},A_{i},Y_{i})\}_{i} which is disjoint from the training data. Given a level α[0,1]\alpha\in[0,1], we construct CIs Γfairα[0,1]\Gamma_{\textup{fair}}^{\alpha}\subseteq[0,1] using 𝒟cal\mathcal{D}_{\textup{cal}}, which provide guarantees of the form:

(τfair(Ψ)Γfairα)1α.\displaystyle\mathbb{P}(\tau^{\ast}_{\textup{fair}}(\Psi)\in\Gamma_{\textup{fair}}^{\alpha})\geq 1-\alpha. (3)

Here, it is important to note that Ψ[0,1]\Psi\in[0,1] and Γfairα\Gamma_{\textup{fair}}^{\alpha} are random variables obtained from the calibration data 𝒟cal\mathcal{D}_{\textup{cal}}, and the guarantee in Eq. (3) holds marginally over Ψ\Psi and Γfairα\Gamma_{\textup{fair}}^{\alpha}. While our CIs in this section require the availability of the sensitive attributes in 𝒟cal\mathcal{D}_{\textup{cal}}, in Appendix D we also extend our methodology to the setting where sensitive attributes are missing. In this section, for notational convenience we use hλ()h_{\lambda}(\cdot) to denote the YOTO model hθ(,λ)h_{\theta}(\cdot,\lambda) for λΛ\lambda\in\Lambda.

The uncertainty in our trade-off estimate arises, in part, from the uncertainty in the accuracy and fairness violations of our trained model. Therefore, our methodology of constructing CIs on τfair\tau^{\ast}_{\textup{fair}}, involves first constructing CIs on test accuracy acc(hλ)\textup{acc}(h_{\lambda}) and fairness violation Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}) for a given value of λ\lambda using 𝒟cal\mathcal{D}_{\textup{cal}}, denoted as Caccα(λ)C_{\textup{acc}}^{\alpha}(\lambda) and Cfairα(λ)C_{\textup{fair}}^{\alpha}(\lambda) respectively satisfying,

(acc(hλ)Caccα(λ))1α,and(Φfair(hλ)Cfairα(λ))1α.\displaystyle\mathbb{P}(\textup{acc}(h_{\lambda})\in C_{\textup{acc}}^{\alpha}(\lambda))\geq 1-\alpha,\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h_{\lambda})\in C_{\textup{fair}}^{\alpha}(\lambda))\geq 1-\alpha.

One way to construct these CIs involves using assumption-light concentration inequalities such as Hoeffding’s inequality. To be more concrete, for the accuracy acc(hλ)\textup{acc}(h_{\lambda}):

Lemma 3.1 (Hoeffding’s inequality).

Given a classifier hλ:𝒳𝒴h_{\lambda}:\mathcal{X}\rightarrow\mathcal{Y}, we have that,

(acc(hλ)[acc(hλ)~δ,acc(hλ)~+δ])1α.\displaystyle\mathbb{P}\left(\textup{acc}(h_{\lambda})\in\left[\widetilde{\textup{acc}(h_{\lambda})}-\delta,\widetilde{\textup{acc}(h_{\lambda})}+\delta\right]\right)\geq 1-\alpha.

Here, acc(h)~(Xi,Ai,Yi)𝒟cal𝟙(h(Xi)=Yi)|𝒟cal|\widetilde{\textup{acc}(h)}\coloneqq\sum_{(X_{i},A_{i},Y_{i})\in\mathcal{D}_{\textup{cal}}}\frac{\mathbbm{1}(h(X_{i})=Y_{i})}{|\mathcal{D}_{\textup{cal}}|} and δ12|𝒟cal|log(2α)\delta\coloneqq\sqrt{\frac{1}{2|\mathcal{D}_{\textup{cal}}|}\log{(\frac{2}{\alpha})}}.

Lemma 3.1 illustrates that we can use Hoeffding’s inequality to construct confidence interval Caccα(λ)=[acc(hλ)~δ,acc(hλ)~+δ]C_{\textup{acc}}^{\alpha}(\lambda)=[\widetilde{\textup{acc}(h_{\lambda})}-\delta,\widetilde{\textup{acc}(h_{\lambda})}+\delta] on acc(hλ)\textup{acc}(h_{\lambda}) such that the true acc(hλ)\textup{acc}(h_{\lambda}) will lie inside the CI with probability 1α1-\alpha. Analogously, we also construct CIs for fairness violations, Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}), although this is subject to additional nuanced challenges, which we address using a novel sub-sampling based methodology in Appendix B. Once we have CIs over acc(hλ)\textup{acc}(h_{\lambda}) and Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}) for a model hλh_{\lambda}, we next outline how to use these to derive CIs for the minimum achievable fairness τfair\tau^{\ast}_{\textup{fair}}, satisfying Eq. (3). We proceed by explaining how to construct the upper and lower CIs separately, as the latter requires additional considerations regarding the trade-off achieved by YOTO.

3.2.1 Upper confidence intervals

We first outline how to obtain one-sided upper confidence intervals on the optimum accuracy-fairness trade-off τfair(Ψ)\tau^{\ast}_{\textup{fair}}(\Psi) of the form Γfairα=[0,Ufairλ]\Gamma_{\textup{fair}}^{\alpha}=[0,U^{\lambda}_{\textup{fair}}], which satisfies the probabilistic guarantee in Eq. (3). To this end, given a classifier hλh_{\lambda}\in\mathcal{H}, our methodology involves constructing one-sided lower CI on the accuracy acc(hλ)\textup{acc}(h_{\lambda}) and upper CI on the fairness violation Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}). We make this concrete below:

Proposition 3.2.

Given hλh_{\lambda}\in\mathcal{H}, let Laccλ,Ufairλ[0,1]L^{\lambda}_{\textup{acc}},U^{\lambda}_{\textup{fair}}\in[0,1] be lower and upper CIs on acc(hλ)\textup{acc}(h_{\lambda}) and Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}), i.e.

(acc(hλ)Laccλ)1α/2,and(Φfair(hλ)Ufairλ)1α/2.\displaystyle\mathbb{P}\left(\textup{acc}(h_{\lambda})\geq L^{\lambda}_{\textup{acc}}\right)\geq 1-\alpha/2,\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h_{\lambda})\leq U^{\lambda}_{\textup{fair}})\geq 1-\alpha/2.

Then, (τfair(Laccλ)Ufairλ)1α\mathbb{P}\left(\tau^{\ast}_{\textup{fair}}(L^{\lambda}_{\textup{acc}})\leq U^{\lambda}_{\textup{fair}}\right)\geq 1-\alpha.

Proposition 3.2 shows that for any model hλh_{\lambda}, the upper CI on model fairness, UfairλU^{\lambda}_{\textup{fair}}, provides a valid upper CI for the trade-off value at LaccλL^{\lambda}_{\textup{acc}}, i.e. τfair(Laccλ)\tau^{\ast}_{\textup{fair}}(L^{\lambda}_{\textup{acc}}). This can be used to construct upper CIs on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) for a given accuracy level ψ\psi. To understand how this can be achieved, we first find λΛ\lambda\in\Lambda such that the lower CI on the accuracy of model hλh_{\lambda}, LaccλL^{\lambda}_{\textup{acc}}, satisfies LaccλψL^{\lambda}_{\textup{acc}}\geq\psi. Then, since by definition τfair\tau^{\ast}_{\textup{fair}} is a monotonically increasing function, we know that τfair(Laccλ)τfair(ψ)\tau^{\ast}_{\textup{fair}}(L^{\lambda}_{\textup{acc}})\geq\tau^{\ast}_{\textup{fair}}(\psi). Since Proposition 3.2 tells us that UfairλU^{\lambda}_{\textup{fair}} is an upper CI for τfair(Laccλ)\tau^{\ast}_{\textup{fair}}(L^{\lambda}_{\textup{acc}}), it follows that UfairλU^{\lambda}_{\textup{fair}} is also a valid upper CI for τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi).

Intuitively, Proposition 3.2 provides the ‘worst-case’ optimal trade-off, accounting for finite-sample uncertainty. It is important to note that this result does not rely on any assumptions regarding the optimality of the trained classifiers. This means that the upper CIs will remain valid even if the YOTO classifier hλh_{\lambda} is not trained well (and hence achieves sub-optimal accuracy-fairness trade-offs), although in such cases the CI may be conservative.

Having explained how to construct upper CIs on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi), we next move on to the lower CIs.

3.2.2 Lower confidence intervals

Obtaining lower confidence intervals on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) is more challenging than obtaining upper confidence intervals. We begin by explaining at an intuitive level why this is the case.

Suppose that hλh_{\lambda}\in\mathcal{H} is such that acc(hλ)=ψ\textup{acc}(h_{\lambda})=\psi, then since τfair\tau^{\ast}_{\textup{fair}} denotes the minimum attainable fairness violation (Eq. (1)), we have that τfair(ψ)Φfair(hλ)\tau^{\ast}_{\textup{fair}}(\psi)\leq\Phi_{\textup{fair}}(h_{\lambda}). Therefore, any valid upper confidence interval on Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}) will also be valid for τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi). However, a lower bound on Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}) cannot be used as a lower bound for the minimum achievable fairness τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) in general. A valid lower CI for τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) will therefore depend on the gap between fairness violation achieved by hλh_{\lambda}, Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}), and minimum achievable fairness violation τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) (i.e., Δ(hλ)\Delta(h_{\lambda}) term in Figure 2(a)). We make this concrete by constructing lower CIs depending on Δ(hλ)\Delta(h_{\lambda}) explicitly.

Proposition 3.3.

Given hλh_{\lambda}\in\mathcal{H}, let Uaccλ,Lfairλ[0,1]U^{\lambda}_{\textup{acc}},L^{\lambda}_{\textup{fair}}\in[0,1] be upper and lower CIs on acc(hλ)\textup{acc}(h_{\lambda}) and Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}), i.e.

(acc(hλ)Uaccλ)1α/2,and(Φfair(hλ)Lfairλ)1α/2.\displaystyle\mathbb{P}(\textup{acc}(h_{\lambda})\leq U^{\lambda}_{\textup{acc}})\geq 1-\alpha/2,\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h_{\lambda})\geq L^{\lambda}_{\textup{fair}})\geq 1-\alpha/2.

Then, (τfair(Uaccλ)LfairλΔ(hλ))1α\mathbb{P}\left(\tau^{\ast}_{\textup{fair}}(U^{\lambda}_{\textup{acc}})\geq L^{\lambda}_{\textup{fair}}-\Delta(h_{\lambda})\right)\geq 1-\alpha, where Δ(hλ)Φfair(hλ)τfair(acc(hλ))0\Delta(h_{\lambda})\coloneqq\Phi_{\textup{fair}}(h_{\lambda})-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h_{\lambda}))\geq 0.

Proposition 3.3 can be used to derive lower CIs on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) at a specified accuracy level ψ\psi, using a methodology analogous to that described in Section 3.2.1. Intuitively, this result provides the ‘best-case’ optimal trade-off, accounting for finite-sample uncertainty. However, unlike the upper CI, the lower CI includes the Δ(hλ)\Delta(h_{\lambda}) term, which is typically unknown. To circumvent this, we propose a strategy for obtaining plausible approximations for Δ(hλ)\Delta(h_{\lambda}) in practice in the following section.

3.2.3 Sensitivity analysis for Δ(hλ)\Delta(h_{\lambda})

Recall that Δ(hλ)\Delta(h_{\lambda}) quantifies the difference between the fairness loss of classifier hλh_{\lambda} and the minimum attainable fairness loss τfair(acc(hλ))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h_{\lambda})), and is an unknown quantity in general (see Figure 2(a)). Here, we propose a practical strategy for positing values for Δ(hλ)\Delta(h_{\lambda}) which encode our belief on how close the fairness loss Φfair(hλ)\Phi_{\textup{fair}}(h_{\lambda}) is to τfair(acc(hλ))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h_{\lambda})). This allows us to construct CIs which not only incorporate finite-sampling uncertainty from calibration data, but also account for the possible sub-optimality in the trade-offs achieved by hλh_{\lambda}. The main idea behind our approach is to calibrate Δ(hλ)\Delta(h_{\lambda}) using additional separately trained standard models without imposing significant computational overhead.

Details   

Our sensitivity analysis uses kk additional models {h(1),h(2),,h(k)}\mathcal{M}\coloneqq\{h^{(1)},h^{(2)},\ldots,h^{(k)}\}\subseteq\mathcal{H} trained separately using the standard regularized loss λ\mathcal{L}_{\lambda^{\prime}} (Eq. (2)) for some randomly chosen values of λ\lambda^{\prime}. Let 0\mathcal{M}_{0}\subseteq\mathcal{M} denote the models which achieve a better empirical trade-off than the YOTO model on 𝒟cal\mathcal{D}_{\textup{cal}}, i.e. the empirical trade-offs for models in 0\mathcal{M}_{0} lie below the YOTO trade-off curve (see Figure 2(b)). We choose Δ(hλ)\Delta(h_{\lambda}) for our YOTO model to be the maximum gap between empirical trade-offs of these separately trained models in 0\mathcal{M}_{0} and the YOTO model. It can be seen from Proposition 3.3 that, in practice, this will result in a downward shift in the lower CI until all the separately trained models in \mathcal{M} lie above the lower CI. As a result, our methodology yields increasingly conservative lower CIs as the number of additional models |||\mathcal{M}| increases.

Even though the procedure above requires training additional models \mathcal{M}, it does not impose the same computational overhead as training models over the full range of λ\lambda values. We show empirically in Section 5 that in practice 2 models are usually sufficient to obtain informative and reliable intervals. Additionally, we also show that when YOTO achieves the optimal trade-off (i.e., Δ(hλ)=0\Delta(h_{\lambda})=0), our sensitivity analysis leaves the CIs unchanged, thereby preventing unnecessary conservatism.

Refer to caption
(a) Visual representation of the difference between the optimal and model achieved trade-offs.
Refer to caption
(b) Sensitivity analysis shifts lower CIs below by a constant amount whereas upper CIs remain unchanged.
Figure 2: Visual illustrations for Δ(h)\Delta(h) (Figure 2(a)) and our sensitivity analysis procedure (Figure 2(b)).
Asymptotic analysis of Δ(hλ)\Delta(h_{\lambda})   

While the procedure described above provides a practical solution for obtaining plausible approximations for Δ(hλ)\Delta(h_{\lambda}), we next present a theoretical result which provides reassurance that this gap should become negligible as the number of training data 𝒟tr\mathcal{D}_{\textup{tr}} increases.

Theorem 3.4.

Let Φfair(h)^,acc(h)^\widehat{\Phi_{\textup{fair}}(h^{\prime})},\widehat{\textup{acc}(h^{\prime})} denote the fairness violation and accuracy for hh^{\prime}\in\mathcal{H} evaluated on training data 𝒟tr\mathcal{D}_{\textup{tr}} and let

hargminhΦfair(h)^ subject to acc(h)^δ.\displaystyle h\coloneqq\arg\min_{h^{\prime}\in\mathcal{H}}\widehat{\Phi_{\textup{fair}}(h^{\prime})}\text{ subject to }\widehat{\textup{acc}(h^{\prime})}\geq\delta. (4)

Then, given η(0,1)\eta\in(0,1), under standard regularity assumptions, we have that with probability at least 1η1-\eta, Δ(h)O~(|𝒟tr|γ),\Delta(h)\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}), for some γ(0,1/2]\gamma\in(0,1/2] where O~()\widetilde{O}(\cdot) suppresses dependence on log(1/η)\log{(1/\eta)}.

Theorem 3.4 shows that as the training data size |𝒟tr||\mathcal{D}_{\textup{tr}}| increases, the error term Δ(hλ)\Delta(h_{\lambda}) will become negligible with a high probability for any model hλh_{\lambda} which minimises the empirical training loss in Eq. (4). In this case, Δ(hλ)\Delta(h_{\lambda}) should not have a significant impact on the lower CIs in Proposition 3.3 and the CIs will reflect the uncertainty in τfair\tau^{\ast}_{\textup{fair}} arising mostly due to finite calibration data. We also verify this empirically in Appendix F.7. It is worth noting that Theorem 3.4 relies on the same assumptions as used in Theorem 2 in [1], which have been provided in Appendix A.2.

4 Related works

Many previous fairness methods in the literature, termed in-processing methods, introduce constraints or regularization terms to the optimization objective. For instance, [1, 10] impose a priori uniform constraints on model fairness at training time. However, given the data-dependent nature of accuracy-fairness trade-offs, setting a uniform fairness threshold may not be suitable. Other in-processing methods [37, 24] consider information-theoretic bounds on the optimal trade-off in infinite data limit, independent of a specific model class. While these works offer valuable theoretical insights, there is no guarantee that these frontiers are attainable by models within a given model class \mathcal{H}. We verify this empirically in Appendix F.6 by showing that, for the Adult dataset, the frontiers proposed in [24] are not achieved by any SOTA method we considered. In contrast, our method provides guarantees on the achievable trade-off curve within realistic constraints of model class and data availability.

Various other regularization approaches [39, 33, 8, 14, 41, 42, 43] have also been proposed, but these often necessitate training multiple models, making them computationally intensive. Alternative strategies include learning ‘fair’ representations [44, 30, 31], or re-weighting data based on sensitive attributes [18, 22]. These, however, provide limited control over accuracy-fairness trade-offs.

Besides this, post-processing methods [19, 38] enforce fairness after training but can lead to other forms of unfairness such as disparate treatment of similar individuals [16]. Moreover, many post-hoc approaches such as [3, 2] still require solving different optimisation problems for different fairness thresholds. Other methods such as [46, 27] involve learning a post-hoc module in addition to the base classifier. As a result, the computational cost of training the YOTO model is similar to (and in many cases lower than) the combined cost of training a base model and subsequently applying a post-processing intervention to this pre-trained classifier. We confirm this empirically in Section 5.

Refer to caption
(a)
Adult dataset    COMPAS dataset   CelebA dataset   Jigsaw dataset
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Refer to caption
(l)
Refer to caption
(m)
Figure 3: Results on four real-world datasets where 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% data split. Here, α=0.05\alpha=0.05 and we use ||=2|\mathcal{M}|=2 separately trained models for sensitivity analysis.

5 Experiments

In this section, we empirically validate our methodology of constructing confidence intervals on the fairness trade-off curve, across diverse datasets with neural networks as model class \mathcal{H}. These datasets range from tabular (Adult and COMPAS ), to image-based (CelebA), and natural language processing datasets (Jigsaw). Recall that our approach involves two steps: initial estimation of the trade-off via the YOTO model, followed by the construction of CIs using calibration data 𝒟cal\mathcal{D}_{\textup{cal}}.

To evaluate our methodology, we implement a suite of baseline algorithms including SOTA in-processing techniques such as regularization-based approaches [8], a SOTA kernel-density based method [12] (denoted as ‘KDE-fair’), as well as the reductions method [1]. Additionally, we also compare against adversarial fairness techniques [45] and a post-processing approach (denoted as ‘RTO’)  [2] and consider the three most prominent fairness metrics: Demographic Parity (DP), Equalized Odds (EO), and Equalized Opportunity (EOP). We provide additional details and results in Appendix F, where we also consider a synthetic setup with tractable τfair\tau^{\ast}_{\textup{fair}}. The code to reproduce our experiments is provided at github.com/faaizT/DatasetFairness.

5.1 Results

Figure 3 shows the results for different datasets and fairness violations, obtained using a 10% data split as calibration dataset 𝒟cal\mathcal{D}_{\textup{cal}}. For each dataset, we construct 4 CIs that serve as the upper and lower bounds on the optimal accuracy-fairness trade-off curve. These intervals are computed at a 95% confidence level using various methodologies, including 1) Hoeffding’s, 2) Bernstein’s inequalities which both offer finite sample guarantees as well as, 3) bootstrapping [17], and 4) asymptotic intervals based on the Central Limit Theorem [26] which are valid asymptotically. There are 4 key takeaways:

Table 1: Proportion of empirical trade-offs for each baseline in the three trade-off regions, aggregated across all datasets and fairness metrics (using Bernstein’s CIs). ‘Unlikely’, ‘Permissible’ and ‘Sub-optimal’ correspond to the blue, green and pink regions in Figure 1 respectively. The last column shows the rough average training time per model across experiments ×\times no. of models per experiment.
Category Baseline Unlikely Permissible Sub-optimal \approx Training time
In-processing adversary [45] 0.03 ±\pm 0.03 0.51 ±\pm 0.07 0.45 ±\pm 0.07 100 min×\times40
logsig [8]    0.0 ±\pm 0.0 0.66 ±\pm 0.1 0.33 ±\pm 0.1 100 min×\times40
reductions [1] 0.0 ±\pm 0.0 0.79 ±\pm 0.1 0.21 ±\pm 0.1    90 min×\times40
linear [8] 0.01 ±\pm 0.0 0.85 ±\pm 0.05 0.14 ±\pm 0.06 100 min×\times40
KDE-fair [12] 0.0 ±\pm 0.0 0.97 ±\pm 0.05 0.03 ±\pm 0.06    85 min×\times40
separate 0.0 ±\pm 0.0 0.98 ±\pm 0.01 0.02 ±\pm 0.02 100 min×\times40
YOTO (Ours) 0.0 ±\pm 0.0 1.0 ±\pm 0.0 0.0 ±\pm 0.0 105 min×\times1
Post-processing RTO [2] 0.0 ±\pm 0.0 0.65 ±\pm 0.2 0.35 ±\pm 0.05    95 min (training base classifier)
+ 10min×\times40 (post-hoc optimisations)

Takeaway 1: Trade-off curves are data dependent. The results in Figure 3 confirm that the accuracy-fairness trade-offs can vary significantly across the datasets. For example, achieving near-perfect fairness (i.e. Φfair(h)0\Phi_{\textup{fair}}(h)\approx 0) seems significantly easier for the Jigsaw dataset than the COMPAS dataset, even as the accuracy increases. Likewise, for Adult and COMPAS, the DP increases gradually with increasing accuracy, whereas for CelebA, the increase is sharp once the accuracy increases above 90%. Therefore, using a uniform fairness threshold across datasets [as in 1] may be too restrictive, and our methodology provides more dataset-specific insights about the entire trade-off curve instead.

Takeaway 2: Our CIs are both reliable and informative. Recall that, any trade-off which lies above our upper CIs is guaranteed to be sub-optimal with probability 1α1-\alpha, thereby enabling practitioners to effectively distinguish between genuine sub-optimalities and those due to finite-sample errors. Table 1 lists the proportion of sub-optimal empirical trade-offs for each baseline and provides a principled comparison of the baselines. For example, the adversarial, RTO and logsig baselines have a significantly higher proportion of sub-optimal trade-offs than the KDE-fair and separate baselines.

On the other hand, the validity of our lower CIs depends on the optimality of our YOTO model and the lower CIs may be too tight if YOTO is sub-optimal. Therefore, for the lower CIs to be reliable, it must be unlikely for any baseline to achieve a trade-off below the lower CIs. Table 1 confirms this empirically, as the proportion of models which lie below the lower CIs is negligible. In Appendix E, we also account for the uncertainty in baseline trade-offs when assessing the optimality, hence yielding more robust inferences. The results remain similar to those in Table 1.

Takeaway 3: YOTO trade-offs are consistent with SOTA. We observe that the YOTO trade-offs align well with most of the SOTA baselines considered while reducing the computational cost by approximately 40-fold (see the final column of Table 1). In some cases, YOTO even achieves a better trade-off than the baselines considered. See, e.g., the Jigsaw dataset results (especially for EOP). Moreover, we observe that the baselines yield empirical trade-offs which have a high variance as accuracy increases (see Jigsaw results in Figure 3, for example). This behaviour starkly contrasts the smooth variations exhibited by our YOTO-generated trade-off curves along the accuracy axis.

Takeaway 4: Sensitivity analysis does not cause unnecessary conservatism. We use 2 randomly chosen separately trained models to perform our sensitivity analysis for Figure 3. We find that this only causes a shift in lower CIs for 2 out of the 12 trade-off curves presented (i.e. for DP and EO trade-offs on the Adult dataset), leaving the rest of the CIs unchanged. Therefore, in practice sensitivity analysis does not impose significant computational overhead, and only changes the CIs when YOTO achieves a suboptimal trade-off. Additional results have been included in Appendix C.

6 Discussion and Limitations

In this work, we propose a computationally efficient approach to capture the accuracy-fairness trade-offs inherent to individual datasets, backed by sound statistical guarantees. Our proposed methodology enables a nuanced and dataset-specific understanding of the accuracy-fairness trade-offs. It does so by obtaining confidence intervals on the accuracy-fairness trade-off, leveraging the computational benefits of the You-Only-Train-Once (YOTO) framework [15]. This empowers practitioners with the ability to, at inference time, specify desired accuracy levels and promptly receive corresponding permissible fairness ranges. By eliminating the need for repetitive model training, we significantly streamline the process of obtaining accuracy-fairness trade-offs tailored to individual datasets.

Limitations    Despite the evident merits of our approach, it also has some limitations. Firstly, our methodology requires distinct datasets for training and calibration, posing difficulties when data is limited. Under such constraints, the YOTO model might not capture the optimal accuracy-fairness trade-off, and moreover, the resulting confidence intervals could be overly conservative. Secondly, our lower CIs incorporate an unknown term Δ(hλ)\Delta(h_{\lambda}). While we propose sensitivity analysis for approximating this term and prove that it is asymptotically negligible under certain mild assumptions in Section 3.2.3, a more exhaustive understanding remains an open question. Exploring informative upper bounds for Δ(hλ)\Delta(h_{\lambda}) under weaker conditions is a promising avenue for future investigations.

Acknowledgments

We would like to express our gratitude to Sahra Ghalebikesabi for her valuable feedback on an earlier draft of this paper. We also thank the anonymous reviewers for their thoughtful and constructive comments, which enhanced the clarity and rigor of our final submission.

References

  • [1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. 03 2018.
  • [2] I. M. Alabdulmohsin and M. Lucic. A near-optimal algorithm for debiasing trained machine learning models. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8072–8084. Curran Associates, Inc., 2021.
  • [3] W. Alghamdi, H. Hsu, H. Jeong, H. Wang, P. W. Michalak, S. Asoodeh, and F. P. Calmon. Beyond adult and compas: Fairness in multi-class prediction, 2022.
  • [4] A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, and T. Zrnic. Prediction-powered inference, 2023.
  • [5] J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias, 2016.
  • [6] P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. In D. Helmbold and B. Williamson, editors, Computational Learning Theory, pages 224–240, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
  • [7] B. Becker and R. Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
  • [8] H. Bendekgey and E. B. Sudderth. Scalable and stable surrogates for flexible classifiers with fairness constraints. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  • [9] S. Caton and C. Haas. Fairness in machine learning: A survey. CoRR, abs/2010.04053, 2020.
  • [10] L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 319–328, New York, NY, USA, 2019. Association for Computing Machinery.
  • [11] L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi. Fair classification with noisy protected attributes: A framework with provable guarantees. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1349–1361. PMLR, 18–24 Jul 2021.
  • [12] J. Cho, G. Hwang, and C. Suh. A fair classifier using kernel density estimation. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15088–15099. Curran Associates, Inc., 2020.
  • [13] J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.
  • [14] M. Donini, L. Oneto, S. Ben-David, J. S. Shawe-Taylor, and M. Pontil. Empirical risk minimization under fairness constraints. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • [15] A. Dosovitskiy and J. Djolonga. You only train once: Loss-conditional training of deep networks. In International Conference on Learning Representations, 2020.
  • [16] EEOC. Uniform guidelines on employee selection procedures, 1979.
  • [17] B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1):1–26, 1979.
  • [18] A. Grover, J. Song, A. Agarwal, K. Tran, A. Kapoor, E. Horvitz, and S. Ermon. Bias Correction of Learned Generative Models Using Likelihood-Free Importance Weighting. Curran Associates Inc., Red Hook, NY, USA, 2019.
  • [19] M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3323–3331, Red Hook, NY, USA, 2016. Curran Associates Inc.
  • [20] Jigsaw and Google. Unintended bias in toxicity classification, 2019.
  • [21] S. M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008.
  • [22] F. Kamiran and T. Calders. Data pre-processing techniques for classification without discrimination. Knowledge and Information Systems, 33, 10 2011.
  • [23] J. S. Kim, J. Chen, and A. Talwalkar. Fact: A diagnostic for group fairness trade-offs. CoRR, abs/2004.03424, 2020.
  • [24] J. S. Kim, J. Chen, and A. Talwalkar. Model-agnostic characterization of fairness trade-offs. CoRR, abs/2004.03424, 2020.
  • [25] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [26] L. Le Cam. The central limit theorem around 1935. Statistical Science, 1(1):78–91, 1986.
  • [27] E. Z. Liu, B. Haghgoo, A. S. Chen, A. Raghunathan, P. W. Koh, S. Sagawa, P. Liang, and C. Finn. Just train twice: Improving group robustness without training group information. CoRR, abs/2107.09044, 2021.
  • [28] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  • [29] M. Lohaus, M. Perrot, and U. V. Luxburg. Too relaxed to be fair. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6360–6369. PMLR, 13–18 Jul 2020.
  • [30] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair autoencoder, 2017.
  • [31] K. Lum and J. Johndrow. A statistical framework for fair predictive algorithms, 2016.
  • [32] N. Martinez, M. Bertran, and G. Sapiro. Minimax pareto fairness: A multi objective perspective. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6755–6764. PMLR, 13–18 Jul 2020.
  • [33] M. Olfat and Y. Mintz. Flexible regularization approaches for fairness in deep learning. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3389–3394, 2020.
  • [34] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. C. Courville. Film: Visual reasoning with a general conditioning layer. CoRR, abs/1709.07871, 2017.
  • [35] B. Ustun, Y. Liu, and D. Parkes. Fairness without harm: Decoupled classifiers with preference guarantees. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6373–6382. PMLR, 09–15 Jun 2019.
  • [36] A. Valdivia, J. Sánchez-Monedero, and J. Casillas. How fair can we go in machine learning? assessing the boundaries of accuracy and fairness. International Journal of Intelligent Systems, 36(4):1619–1643, 2021.
  • [37] H. Wang, L. He, R. Gao, and F. Calmon. Aleatoric and epistemic discrimination: Fundamental limits of fairness interventions. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [38] D. Wei, K. N. Ramamurthy, and F. Calmon. Optimized score transformation for fair classification. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1673–1683. PMLR, 26–28 Aug 2020.
  • [39] S. Wei and M. Niethammer. The fairness-accuracy pareto front. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(3):287–302, 2022.
  • [40] J. Yang, A. A. S. Soltan, D. W. Eyre, Y. Yang, and D. A. Clifton. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digital Medicine, 6:55, 2023.
  • [41] M. Zafar, I. Valera, M. Rodriguez, and K. P. Gummadi. Fairness constraints: A mechanism for fair classification. 07 2015.
  • [42] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, page 1171–1180, Republic and Canton of Geneva, CHE, 2017. International World Wide Web Conferences Steering Committee.
  • [43] M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 20(75):1–42, 2019.
  • [44] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 325–333, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  • [45] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’18, page 335–340, New York, NY, USA, 2018. Association for Computing Machinery.
  • [46] A. Ţifrea, P. Lahoti, B. Packer, Y. Halpern, A. Beirami, and F. Prost. Frappé: A group fairness framework for post-processing everything, 2024.

Appendix A Proofs

A.1 Confidence intervals on τfair\tau^{\ast}_{\textup{fair}}

Proof of Lemma 3.1.

This lemma is a straightforward application of Heoffding’s inequality. ∎

Proof of Proposition 3.2.

Here, we prove the result for general classifiers hh\in\mathcal{H}. Let Lacch,UfairhL^{h}_{\textup{acc}},U^{h}_{\textup{fair}} be the lower and upper CIs for acc(h)\textup{acc}(h) and Φfair(h)\Phi_{\textup{fair}}(h) respectively,

(acc(h)Lacch)1α/2and(Φfair(h)Ufairh)1α/2.\displaystyle\mathbb{P}(\textup{acc}(h)\geq L^{h}_{\textup{acc}})\geq 1-\alpha/2\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h)\leq U^{h}_{\textup{fair}})\geq 1-\alpha/2.

Then, using a straightforward application union bounds, we get that

(acc(h)Lacch,Φfair(h)Ufairh)\displaystyle\mathbb{P}(\textup{acc}(h)\geq L^{h}_{\textup{acc}},\Phi_{\textup{fair}}(h)\leq U^{h}_{\textup{fair}}) 1(acc(h)<Lacch)(Φfair(h)>Ufairh)\displaystyle\geq 1-\mathbb{P}(\textup{acc}(h)<L^{h}_{\textup{acc}})-\mathbb{P}(\Phi_{\textup{fair}}(h)>U^{h}_{\textup{fair}})
1α/2α/2=1α.\displaystyle\geq 1-\alpha/2-\alpha/2=1-\alpha.

Using the definition of the optimal fairness-accuracy trade-off τfair\tau^{\ast}_{\textup{fair}}, we get that the event

{acc(h)Lacch,Φfair(h)Ufairh}implies,{min{Φfair(h)|h,acc(h)Lacch}τfair(Lacch)Ufairh}.\displaystyle\left\{\textup{acc}(h)\geq L^{h}_{\textup{acc}},\Phi_{\textup{fair}}(h)\leq U^{h}_{\textup{fair}}\right\}\quad\textup{implies,}\quad\left\{\underbrace{\min\{\Phi_{\textup{fair}}(h^{\prime})\,|\,h^{\prime}\in\mathcal{H},\textup{acc}(h^{\prime})\geq L^{h}_{\textup{acc}}\}}_{\tau^{\ast}_{\textup{fair}}(L^{h}_{\textup{acc}})}\leq U^{h}_{\textup{fair}}\right\}.

From this, it follows that

(τfair(Lacch)Ufairh)(acc(h)Lacch,Φfair(h)Ufairh)1α.\displaystyle\mathbb{P}(\tau^{\ast}_{\textup{fair}}(L^{h}_{\textup{acc}})\leq U^{h}_{\textup{fair}})\geq\mathbb{P}(\textup{acc}(h)\geq L^{h}_{\textup{acc}},\Phi_{\textup{fair}}(h)\leq U^{h}_{\textup{fair}})\geq 1-\alpha.

Proof of Proposition 3.3.

We prove the result for general classifiers hh\in\mathcal{H}. Let Uacch,LfairhU^{h}_{\textup{acc}},L^{h}_{\textup{fair}} be the upper and lower CIs for acc(h)\textup{acc}(h) and Φfair(h)\Phi_{\textup{fair}}(h) respectively,

(acc(h)Uacch)1α/2and(Φfair(h)Lfairh)1α/2.\displaystyle\mathbb{P}(\textup{acc}(h)\leq U^{h}_{\textup{acc}})\geq 1-\alpha/2\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h)\geq L^{h}_{\textup{fair}})\geq 1-\alpha/2.

Then, using an application of union bounds, we get that

(acc(h)Uacch,Φfair(h)Lfairh)\displaystyle\mathbb{P}(\textup{acc}(h)\leq U^{h}_{\textup{acc}},\Phi_{\textup{fair}}(h)\geq L^{h}_{\textup{fair}}) 1(acc(h)>Uacch)(Φfair(h)<Lfairh)\displaystyle\geq 1-\mathbb{P}(\textup{acc}(h)>U^{h}_{\textup{acc}})-\mathbb{P}(\Phi_{\textup{fair}}(h)<L^{h}_{\textup{fair}})
1α/2α/2=1α.\displaystyle\geq 1-\alpha/2-\alpha/2=1-\alpha.

Then, using the fact that Δ(h)=Φfair(h)τfair(acc(h))\Delta(h)=\Phi_{\textup{fair}}(h)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)), we get that

1α(acc(h)Uacch,Φfair(h)Lfairh)\displaystyle 1-\alpha\leq\mathbb{P}(\textup{acc}(h)\leq U^{h}_{\textup{acc}},\Phi_{\textup{fair}}(h)\geq L^{h}_{\textup{fair}}) =(acc(h)Uacch,τfair(acc(h))+Δ(h)Lfairh)\displaystyle=\mathbb{P}(\textup{acc}(h)\leq U^{h}_{\textup{acc}},\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))+\Delta(h)\geq L^{h}_{\textup{fair}})
(acc(h)Uacch,τfair(Uacch)+Δ(h)Lfairh)\displaystyle\leq\mathbb{P}(\textup{acc}(h)\leq U^{h}_{\textup{acc}},\tau^{\ast}_{\textup{fair}}(U^{h}_{\textup{acc}})+\Delta(h)\geq L^{h}_{\textup{fair}})
(τfair(Uacch)LfairhΔ(h)),\displaystyle\leq\mathbb{P}(\tau^{\ast}_{\textup{fair}}(U^{h}_{\textup{acc}})\geq L^{h}_{\textup{fair}}-\Delta(h)),

where in the second last inequality above, we use the fact that τfair:[0,1][0,1]\tau^{\ast}_{\textup{fair}}:[0,1]\rightarrow[0,1] is a monotonically increasing function. ∎

A.2 Asymptotic convergence of Δ(h)\Delta(h)

In this Section, we provide the formal statement for Theorem 3.4 along with the assumptions required for this result.

Assumption A.1.

τfair\tau^{\ast}_{\textup{fair}} is LL-Lipschitz.

Assumption A.2.

Let |𝒟tr|()\mathcal{R}_{|\mathcal{D}_{\textup{tr}}|}(\mathcal{H}) denote the Rademacher complexity of the classifier family \mathcal{H}, where |𝒟tr||\mathcal{D}_{\textup{tr}}| is the number of training examples. We assume that there exists C0C\geq 0 and γ1/2\gamma\leq 1/2 such that |𝒟tr|()C|𝒟tr|γ\mathcal{R}_{|\mathcal{D}_{\textup{tr}}|}(\mathcal{H})\leq C\,|\mathcal{D}_{\textup{tr}}|^{-\gamma}.

It is worth noting that Assumption A.2, which was also used in [1, Theorem 2], holds for many classifier families with γ=1/2\gamma=1/2, including norm-bounded linear functions, neural networks and classifier families with bounded VC dimension [21, 6].

Theorem A.3.

Let Φfair(h)^,acc(h)^\widehat{\Phi_{\textup{fair}}(h^{\prime})},\widehat{\textup{acc}(h^{\prime})} denote the fairness violation and accuracy metrics for model hh^{\prime} evaluated on training data 𝒟tr\mathcal{D}_{\textup{tr}} and define

hargminhΦfair(h)^ subject to acc(h)^δ.h\coloneqq\arg\min_{h^{\prime}\in\mathcal{H}}\widehat{\Phi_{\textup{fair}}(h^{\prime})}\text{ subject to }\widehat{\textup{acc}(h^{\prime})}\geq\delta.

Then, under Assumptions A.1 and A.2, we have that with probability at least 1η1-\eta, Δ(h)O~(|𝒟tr|γ),\Delta(h)\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}), where O~()\widetilde{O}(\cdot) suppresses polynomial dependence on log(1/η)\log{(1/\eta)}.

Proof of Theorem A.3.

Let

ϵ4|𝒟tr|()+4|𝒟tr|+2log(8/η)2|𝒟tr|.\epsilon\coloneqq 4\,\mathcal{R}_{|\mathcal{D}_{\textup{tr}}|}(\mathcal{H})+\frac{4}{\sqrt{|\mathcal{D}_{\textup{tr}}|}}+2\,\sqrt{\frac{\log{(8/\eta)}}{2\,|\mathcal{D}_{\textup{tr}}|}}.

Next, we define

hargminhΦfair(h) subject to acc(h)acc(h)+ϵ.h^{*}\coloneqq\arg\min_{h\in\mathcal{H}}\Phi_{\textup{fair}}(h)\quad\text{ subject to }\quad\textup{acc}(h)\geq\textup{acc}(h)+\epsilon.

Then, we have that

Δ(h)=\displaystyle\Delta(h)= Φfair(h)τfair(acc(h))\displaystyle\Phi_{\textup{fair}}(h)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))
=\displaystyle= Φfair(h)τfair(acc(h)+ϵ)+τfair(acc(h)+ϵ)τfair(acc(h))\displaystyle\Phi_{\textup{fair}}(h)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)+\epsilon)+\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)+\epsilon)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))
=\displaystyle= Φfair(h)Φfair(h)+τfair(acc(h)+ϵ)τfair(acc(h)).\displaystyle\Phi_{\textup{fair}}(h)-\Phi_{\textup{fair}}(h^{*})+\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)+\epsilon)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)).

We know using Assumption A.1 that

τfair(acc(h)+ϵ)τfair(acc(h))Lϵ\displaystyle\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)+\epsilon)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))\leq L\,\epsilon

Moreover,

Φfair(h)Φfair(h)=\displaystyle\Phi_{\textup{fair}}(h)-\Phi_{\textup{fair}}(h^{*})= Φfair(h)^Φfair(h)^+Φfair(h)Φfair(h)^+Φfair(h)^Φfair(h)\displaystyle\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}+\Phi_{\textup{fair}}(h)-\widehat{\Phi_{\textup{fair}}(h)}+\widehat{\Phi_{\textup{fair}}(h^{*})}-\Phi_{\textup{fair}}(h^{*})
\displaystyle\leq Φfair(h)^Φfair(h)^+2maxh|Φfair(h)Φfair(h)^|.\displaystyle\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}+2\max_{h^{\prime}\in\mathcal{H}}|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|.

Putting the two together, we get that

Δ(h)Φfair(h)^Φfair(h)^+2maxh|Φfair(h)Φfair(h)^|+Lϵ.\displaystyle\Delta(h)\leq\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}+2\max_{h^{\prime}\in\mathcal{H}}|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|+L\epsilon. (5)

Next, we consider the term Φfair(h)^Φfair(h)^\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}. First, we observe that

acc(h)acc(h)+ϵacc(h)^|acc(h)^acc(h)|+ϵδ+ϵ|acc(h)^acc(h)|.\displaystyle\textup{acc}(h^{*})\geq\textup{acc}(h)+\epsilon\geq\widehat{\textup{acc}(h)}-|\widehat{\textup{acc}(h)}-\textup{acc}(h)|+\epsilon\geq\delta+\epsilon-|\widehat{\textup{acc}(h)}-\textup{acc}(h)|.

Next, using [1, Lemma 4] with g(X,A,Y)=𝟙(h(X)=Y)g(X,A,Y)=\mathbbm{1}(h(X)=Y), we get that with probability at least 1η/41-\eta/4, we have that

|acc(h)acc(h)^|2|𝒟tr|()+2|𝒟tr|+log(8/η)2|𝒟tr|=ϵ/2.\displaystyle|\textup{acc}(h)-\widehat{\textup{acc}(h)}|\leq 2\,\mathcal{R}_{|\mathcal{D}_{\textup{tr}}|}(\mathcal{H})+\frac{2}{\sqrt{|\mathcal{D}_{\textup{tr}}|}}+\sqrt{\frac{\log{(8/\eta)}}{2\,|\mathcal{D}_{\textup{tr}}|}}=\epsilon/2.

This implies that with probability at least 1η/41-\eta/4,

acc(h)δ+ϵ/2,\displaystyle\textup{acc}(h^{*})\geq\delta+\epsilon/2,

and hence,

δ+ϵ/2acc(h)acc(h)^+|acc(h)acc(h)^|.\displaystyle\delta+\epsilon/2\leq\textup{acc}(h^{*})\leq\widehat{\textup{acc}(h^{*})}+|\textup{acc}(h^{*})-\widehat{\textup{acc}(h^{*})}|.

Again, using [1, Lemma 4] with g(X,A,Y)=𝟙(h(X)=Y)g(X,A,Y)=\mathbbm{1}(h^{*}(X)=Y), we get that with probability at least 1η/41-\eta/4, we have that

|acc(h)acc(h)^|2|𝒟tr|()+2|𝒟tr|+log(8/η)2|𝒟tr|=ϵ/2.\displaystyle|\textup{acc}(h^{*})-\widehat{\textup{acc}(h^{*})}|\leq 2\,\mathcal{R}_{|\mathcal{D}_{\textup{tr}}|}(\mathcal{H})+\frac{2}{\sqrt{|\mathcal{D}_{\textup{tr}}|}}+\sqrt{\frac{\log{(8/\eta)}}{2\,|\mathcal{D}_{\textup{tr}}|}}=\epsilon/2.

Finally, putting this together using union bounds, we get that with probability at least 1η/21-\eta/2,

δ+ϵ/2acc(h)acc(h)^+|acc(h)acc(h)^|acc(h)^+ϵ/2,\displaystyle\delta+\epsilon/2\leq\textup{acc}(h^{*})\leq\widehat{\textup{acc}(h^{*})}+|\textup{acc}(h^{*})-\widehat{\textup{acc}(h^{*})}|\leq\widehat{\textup{acc}(h^{*})}+\epsilon/2,

and hence,

acc(h)^δ.\displaystyle\widehat{\textup{acc}(h^{*})}\geq\delta.

Using the definition of hh, we have that acc(h)^δΦfair(h)^Φfair(h)^\widehat{\textup{acc}(h^{*})}\geq\delta\implies\widehat{\Phi_{\textup{fair}}(h)}\leq\widehat{\Phi_{\textup{fair}}(h^{*})}. Therefore, with probability at least 1η/21-\eta/2,

Φfair(h)^Φfair(h)^0.\displaystyle\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}\leq 0. (6)

Next, to bound the term |Φfair(h)Φfair(h)^||\Phi_{\textup{fair}}(h)-\widehat{\Phi_{\textup{fair}}(h)}| above, we consider a general formulation for the fairness violation Φfair\Phi_{\textup{fair}}, also presented in [1],

Φfair(h)=|Φfair±(h)|where,Φfair±(h)j=1m𝔼[gj(X,A,Y,h(X))j]=:Φj\Phi_{\textup{fair}}(h)=|\Phi_{\textup{fair}}^{\pm}(h)|\quad\textup{where,}\quad\Phi_{\textup{fair}}^{\pm}(h)\coloneqq\sum_{j=1}^{m}\underbrace{\mathbb{E}[g_{j}(X,A,Y,h(X))\mid\mathcal{E}_{j}]}_{=:\Phi_{j}}

where m1m\geq 1, gjg_{j} are some known functions and j\mathcal{E}_{j} are events with positive probability defined with respect to (X,A,Y)(X,A,Y).

Using this, we note that for any hh^{\prime}\in\mathcal{H}

|Φfair(h)Φfair(h)^|=\displaystyle|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|= ||Φfair±(h)||Φfair±(h)^||\displaystyle\Big{|}|\Phi_{\textup{fair}}^{\pm}(h^{\prime})|-|\widehat{\Phi_{\textup{fair}}^{\pm}(h^{\prime})}|\Big{|}
\displaystyle\leq |Φfair±(h)Φfair±(h)^|\displaystyle|\Phi_{\textup{fair}}^{\pm}(h^{\prime})-\widehat{\Phi_{\textup{fair}}^{\pm}(h^{\prime})}|
=\displaystyle= |j=1m𝔼[gj(X,A,Y,h(X))j]𝔼^[gj(X,A,Y,h(X))j]|\displaystyle\Big{|}\sum_{j=1}^{m}\mathbb{E}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]-\widehat{\mathbb{E}}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]\Big{|}
\displaystyle\leq j=1m|𝔼[gj(X,A,Y,h(X))j]𝔼^[gj(X,A,Y,h(X))j]|.\displaystyle\sum_{j=1}^{m}|\mathbb{E}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]-\widehat{\mathbb{E}}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]|.

Let pj(j)p^{*}_{j}\coloneqq\mathbb{P}(\mathcal{E}_{j}). Then, from [1, Lemma 6], we have that if |𝒟tr|pj8log(4m/η)|\mathcal{D}_{\textup{tr}}|\,p^{*}_{j}\geq 8\log{(4\,m/\eta)}, then with probability at least 1η/2m1-\eta/2m, we have that

|𝔼[gj(X,A,Y,h(X))j]𝔼^[gj(X,A,Y,h(X))j]|\displaystyle|\mathbb{E}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]-\widehat{\mathbb{E}}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]|\leq 2R|𝒟tr|pj/2()+22|𝒟tr|pj+log(8m/η)|𝒟tr|pj\displaystyle 2\,R_{|\mathcal{D}_{\textup{tr}}|\,p^{*}_{j}/2}(\mathcal{H})+2\,\sqrt{\frac{2}{|\mathcal{D}_{\textup{tr}}|\,p_{j}^{*}}}+\sqrt{\frac{\log{(8\,m/\eta)}}{|\mathcal{D}_{\textup{tr}}|\,p_{j}^{*}}}
=\displaystyle= O~(|𝒟tr|γ).\displaystyle\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}).

A straightforward application of the union bounds yields that if |𝒟tr|pj8log(4m/η)|\mathcal{D}_{\textup{tr}}|\,p^{*}_{j}\geq 8\log{(4\,m/\eta)} for all j{1,,m}j\in\{1,\ldots,m\}, then with probability at least 1η/21-\eta/2, we have that

j=1m|𝔼[gj(X,A,Y,h(X))j]𝔼^[gj(X,A,Y,h(X))j]|O~(|𝒟tr|γ).\displaystyle\sum_{j=1}^{m}|\mathbb{E}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]-\widehat{\mathbb{E}}[g_{j}(X,A,Y,h^{\prime}(X))\mid\mathcal{E}_{j}]|\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}).

Therefore, in this case for any hh^{\prime}\in\mathcal{H}, we have that with probability at least 1η/21-\eta/2,

|Φfair(h)Φfair(h)^|O~(|𝒟tr|γ),\displaystyle|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}),

and therefore,

2maxh|Φfair(h)Φfair(h)^|O~(|𝒟tr|γ).\displaystyle 2\max_{h^{\prime}\in\mathcal{H}}|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}). (7)

Finally, putting Eq. (5), Eq. (6) and Eq. (7) together using union bounds, we get that with probability at least 1η1-\eta, we have that

Δ(h)Φfair(h)^Φfair(h)^+2maxh|Φfair(h)Φfair(h)^|+LϵO~(|𝒟tr|γ).\displaystyle\Delta(h)\leq\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\Phi_{\textup{fair}}(h^{*})}+2\max_{h^{\prime}\in\mathcal{H}}|\Phi_{\textup{fair}}(h^{\prime})-\widehat{\Phi_{\textup{fair}}(h^{\prime})}|+L\epsilon\leq\widetilde{O}(|\mathcal{D}_{\textup{tr}}|^{-\gamma}).

Appendix B Constructing the confidence intervals on Φfair(h)\Phi_{\textup{fair}}(h)

In this section, we outline methodologies of obtaining confidence intervals for a fairness violation Φfair\Phi_{\textup{fair}}. Specifically, given a model hh\in\mathcal{H}, with h:𝒳𝒴h:\mathcal{X}\rightarrow\mathcal{Y} and α(0,1)\alpha\in(0,1), we outline how to find CfairαC_{\textup{fair}}^{\alpha} which satisfies,

(Φfair(h)Cfairα)1α.\displaystyle\mathbb{P}(\Phi_{\textup{fair}}(h)\in C_{\textup{fair}}^{\alpha})\geq 1-\alpha. (8)

Similar to [1] we express the fairness violation Φfair\Phi_{\textup{fair}} as:

Φfair(h)=|Φfair±(h)|where,Φfair±(h)j=1m𝔼[gj(X,A,Y,h(X))j]=:Φj\Phi_{\textup{fair}}(h)=|\Phi_{\textup{fair}}^{\pm}(h)|\quad\textup{where,}\quad\Phi_{\textup{fair}}^{\pm}(h)\coloneqq\sum_{j=1}^{m}\underbrace{\mathbb{E}[g_{j}(X,A,Y,h(X))\mid\mathcal{E}_{j}]}_{=:\Phi_{j}}

where m1m\geq 1, gjg_{j} are some known functions and j\mathcal{E}_{j} are events with positive probability defined with respect to (X,A,Y)(X,A,Y). For example, when considering the demographic parity (DP), i.e. Φfair=ΦDP\Phi_{\textup{fair}}=\Phi_{\textup{DP}}, we have m=2m=2, with g1(X,A,Y,h(X))=h(X)g_{1}(X,A,Y,h(X))=h(X), 1={A=1}\mathcal{E}_{1}=\{A=1\}, g2(X,A,Y,h(X))=h(X)g_{2}(X,A,Y,h(X))=-h(X) and 2={A=0}\mathcal{E}_{2}=\{A=0\}. Moreover, as shown in [1], the commonly used fairness metrics like Equalized Odds (EO) and Equalized Opportunity (EOP) can also be expressed in similar forms.

Our methodology of constructing CIs on Φfair(h)\Phi_{\textup{fair}}(h) involves first constructing intervals Cfairα,±C_{\textup{fair}}^{\alpha,\pm} on Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h) satisfying:

(Φfair±(h)Cfairα,±)1α.\displaystyle\mathbb{P}(\Phi_{\textup{fair}}^{\pm}(h)\in C_{\textup{fair}}^{\alpha,\pm})\geq 1-\alpha. (9)

Once we have a Cfairα,±C_{\textup{fair}}^{\alpha,\pm}, the confidence interval CfairαC_{\textup{fair}}^{\alpha} satisfying Eq. (8) can simply be constructed as:

Cfairα={|x|:xCfairα,±}.\displaystyle C_{\textup{fair}}^{\alpha}=\{|x|\,:\,x\in C_{\textup{fair}}^{\alpha,\pm}\}.

In what follows, we outline two different ways of constructing the confidence intervals Cfairα,±C_{\textup{fair}}^{\alpha,\pm} on Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h) satisfying Eq. (9).

B.1 Separately constructing CIs on Φj\Phi_{j}

One way to obtain intervals on Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h) would be to separately construct confidence intervals on Φj\Phi_{j}, denoted by CjαC^{\alpha}_{j}, which satisfies the joint guarantee

(j=1m{ΦjCjα})1α.\displaystyle\mathbb{P}\left(\cap_{j=1}^{m}\{\Phi_{j}\in C^{\alpha}_{j}\}\right)\geq 1-\alpha. (10)

Given such set of confidence intervals {Cjα}j=1m\{C^{\alpha}_{j}\}_{j=1}^{m} which satisfy Eqn. Eq. (10), we can obtain the confidence intervals on Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h) by using the fact that

(Φfair±(h)i=1mCjα)1α.\mathbb{P}\left(\Phi_{\textup{fair}}^{\pm}(h)\in\sum_{i=1}^{m}C^{\alpha}_{j}\right)\geq 1-\alpha.

Where, the notation i=1mCjα\sum_{i=1}^{m}C^{\alpha}_{j} denotes the set {i=1mxi:xiCiα}\{\sum_{i=1}^{m}x_{i}\,:\,x_{i}\in C^{\alpha}_{i}\}. One naïve way to obtain such {Cjα}j=1m\{C^{\alpha}_{j}\}_{j=1}^{m} which satisfy Eq. (10) is to use the union bounds, i.e., if CjαC^{\alpha}_{j} are chosen such that

(ΦjCjα)1α/m,\mathbb{P}(\Phi_{j}\in C^{\alpha}_{j})\geq 1-\alpha/m,

then, we have that

(j=1m{ΦjCjα})\displaystyle\mathbb{P}\left(\cap_{j=1}^{m}\{\Phi_{j}\in C^{\alpha}_{j}\}\right) =1(j=1m{ΦjCjα}c)\displaystyle=1-\mathbb{P}(\cup_{j=1}^{m}\{\Phi_{j}\in C^{\alpha}_{j}\}^{c})
1i=1m({ΦjCjα}c)\displaystyle\geq 1-\sum_{i=1}^{m}\mathbb{P}(\{\Phi_{j}\in C^{\alpha}_{j}\}^{c})
1i=1m(1(1α/m))=1α.\displaystyle\geq 1-\sum_{i=1}^{m}(1-(1-\alpha/m))=1-\alpha.

Here, for an event \mathcal{E}, we use c\mathcal{E}^{c} to denote the complement of the event. This methodology therefore reduces the problem of finding confidence intervals on Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h) to finding confidence intervals on Φj\Phi_{j} for j{1,,m}j\in\{1,\ldots,m\}. Now note that Φj\Phi_{j} are all expectations and we can use standard methodologies to construct confidence intervals on an expectation. We explicitly outline how to do this in Section B.3.

Remark

The methodology outlined above provides confidence intervals with valid finite sample coverage guarantees. However, this may come at the cost of more conservative confidence intervals. One way to obtain less conservative confidence intervals while retaining the coverage guarantees would be to consider alternative ways of obtaining confidence intervals which do not require constructing the CIs separately on Φj\Phi_{j}. We outline one such methodology in the next section.

B.2 Using subsampling to construct the CIs on Φfair±\Phi_{\textup{fair}}^{\pm} directly

Here, we outline how we can avoid having to use union bounds when constructing the confidence intervals on Φfair±\Phi_{\textup{fair}}^{\pm}. Let 𝒟j\mathcal{D}_{j} denote the subset of data 𝒟cal\mathcal{D}_{\textup{cal}}, for which the event j\mathcal{E}_{j} is true. In the case where the events j\mathcal{E}_{j} are all mutually exclusive and hence 𝒟j\mathcal{D}_{j} are all disjoint subsets of data (which is true for DP, EO and EOP), we can also construct these intervals by randomly sampling without replacement datapoints (xi(j),ai(j),yi(j))(x^{(j)}_{i},a^{(j)}_{i},y^{(j)}_{i}) from 𝒟j\mathcal{D}_{j} for ilminkm|𝒟k|i\leq l\coloneqq\min_{k\leq m}|\mathcal{D}_{k}|. We use the fact that

Φfair±^(h)=1li=1lj=1mgj(xi(j),ai(j),yi(j),h(xi(j)))\widehat{\Phi_{\textup{fair}}^{\pm}}(h)=\frac{1}{l}\sum_{i=1}^{l}\sum_{j=1}^{m}g_{j}(x^{(j)}_{i},a^{(j)}_{i},y^{(j)}_{i},h(x^{(j)}_{i}))

is an unbiased estimator of Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h). Moreover, since 𝒟j\mathcal{D}_{j} are all disjoint datasets, the datapoints (xi(j),ai(j),yi(j))(x^{(j)}_{i},a^{(j)}_{i},y^{(j)}_{i}) are all independent across different values of jj, and therefore, j=1mgj(xi(j),ai(j),yi(j),h(xi(j)))\sum_{j=1}^{m}g_{j}(x^{(j)}_{i},a^{(j)}_{i},y^{(j)}_{i},h(x^{(j)}_{i})) are i.i.d.. In other words,

Φfair±^(h)=1li=1lϕiwhere, ϕij=1mgj(xi(j),ai(j),yi(j),h(xi(j)))\displaystyle\widehat{\Phi_{\textup{fair}}^{\pm}}(h)=\frac{1}{l}\sum_{i=1}^{l}\phi_{i}\quad\textup{where, }\quad\phi_{i}\coloneqq\sum_{j=1}^{m}g_{j}(x^{(j)}_{i},a^{(j)}_{i},y^{(j)}_{i},h(x^{(j)}_{i}))

and ϕi\phi_{i} are all i.i.d. samples and unbiased estimators of Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h). Therefore, like in the previous section, our problem reduces to constructing CIs on an expectation term (i.e. Φfair±(h)\Phi_{\textup{fair}}^{\pm}(h)), using i.i.d. unbiased samples (i.e. ϕi\phi_{i}) and we can use standard methodologies to construct these intervals.

Benefit of this methodology

This methodology no longer requires us to separately construct confidence intervals over Φj\Phi_{j} and combine them using union bounds (for example). Therefore, intervals obtained using this methodology may be less conservative than those obtained by separately constructing confidence intervals over Φj\Phi_{j}.

Limitation of this methodology

For each subset of data 𝒟j\mathcal{D}_{j}, we can use at most lminkm|𝒟k|l\coloneqq\min_{k\leq m}|\mathcal{D}_{k}| data points to construct the confidence intervals. Therefore, in cases where ll is very small, we may end up discarding a big proportion of the calibration data which could in turn lead to loose intervals.

B.3 Constructing CIs on expectations

Here, we outline some standard techniques used to construct CIs on the expectation of a random variable. These techniques can then be used to construct CIs on Φfair(h)\Phi_{\textup{fair}}(h) (using either of the two methodologies outlined above) as well as on acc(h)\textup{acc}(h). In this section, we restrict ourselves to constructing upper CIs. Lower CIs can be constructed analogously.

Given dataset {Zi:1in}\{Z_{i}:1\leq i\leq n\}, our goal in this section is to construct upper CIs on 𝔼[Z]\mathbb{E}[Z] which satisfies

(𝔼[Z]Uα)1α.\displaystyle\mathbb{P}(\mathbb{E}[Z]\leq U^{\alpha})\geq 1-\alpha.
Hoeffding’s inequality

We can use Hoeffding’s inequality to construct these intervals UαU^{\alpha} as formalised in the following result:

Lemma B.1 (Hoeffding’s inequality).

Let Zi[0,1]Z_{i}\in[0,1], 1in1\leq i\leq n be i.i.d. samples with mean 𝔼[Z]\mathbb{E}[Z]. Then,

(𝔼[Z]1ni=1nZi+12nlog1α)1α.\displaystyle\mathbb{P}\left(\mathbb{E}[Z]\leq\frac{1}{n}\,\sum_{i=1}^{n}Z_{i}+\sqrt{\frac{1}{2\,n}\log{\frac{1}{\alpha}}}\right)\geq 1-\alpha.
Bernstein’s inequality

Bernstein’s inequality provides a powerful tool for bounding the tail probabilities of the sum of independent, bounded random variables. Specifically, for a sum i=1nZi\sum_{i=1}^{n}Z_{i} comprised of nn independent random variables ZiZ_{i} with Zi[0,B]Z_{i}\in[0,B], each with a maximum variance of σ2\sigma^{2}, and for any t>0t>0, the inequality states that

(𝔼[i=1nZi]i=1nZi>t)exp(t22σ2+23tB),\mathbb{P}\left(\mathbb{E}\left[\sum_{i=1}^{n}Z_{i}\right]-\sum_{i=1}^{n}Z_{i}>t\right)\leq\exp\left(-\frac{t^{2}}{2\sigma^{2}+\frac{2}{3}tB}\right),

where BB denotes an upper bound on the absolute value of each random variable. Re-arranging the above, we get that

(𝔼[Z]<1n(i=1nZi+t))1exp(t22σ2+23tB).\mathbb{P}\left(\mathbb{E}[Z]<\frac{1}{n}\left(\sum_{i=1}^{n}Z_{i}+t\right)\right)\geq 1-\exp\left(-\frac{t^{2}}{2\sigma^{2}+\frac{2}{3}tB}\right).

This allows us to construct upper CIs on 𝔼[Z]\mathbb{E}[Z].

Central Limit Theorem

The Central Limit Theorem (CLT) [26] serves as a cornerstone in statistics for constructing confidence intervals around sample means, particularly when the sample size is substantial. The theorem posits that, for a sufficiently large sample size, the distribution of the sample mean will closely resemble a normal (Gaussian) distribution, irrespective of the original population’s distribution. This Gaussian nature of the sample mean empowers us to form confidence intervals for the population mean using the normal distribution’s characteristics.

Given Z1,Z2,,ZnZ_{1},Z_{2},\dots,Z_{n} as nn independent and identically distributed (i.i.d.) random variables with mean μ\mu and variance σ2\sigma^{2}, the sample mean Z¯\bar{Z} approximates a normal distribution with mean μ\mu and variance σ2/n\sigma^{2}/n for large nn. An upper (1α)(1-\alpha) confidence interval for μ\mu is thus:

Uα=Z¯+zα/2σnU^{\alpha}=\bar{Z}+z_{\alpha/2}\frac{\sigma}{\sqrt{n}}

where zα/2z_{\alpha/2} represents the critical value from the standard normal distribution corresponding to a cumulative probability of 1α1-\alpha.

Bootstrap Confidence Intervals

Bootstrapping, introduced by [17], offers a non-parametric approach to estimate the sampling distribution of a statistic. The method involves repeatedly drawing samples (with replacement) from the observed data and recalculating the statistic for each resample. The resulting empirical distribution of the statistic across bootstrap samples forms the basis for confidence interval construction.

Given a dataset Z1,Z2,,ZnZ_{1},Z_{2},\dots,Z_{n}, one can produce BB bootstrap samples by selecting nn observations with replacement from the original data. For each of these samples, the statistic of interest (for instance, the mean) is determined, yielding BB bootstrap estimates. An upper (1α)(1-\alpha) bootstrap confidence interval for 𝔼[Z]\mathbb{E}[Z] is given by:

Uα=Z¯+(zαZ¯)U^{\alpha}=\bar{Z}+(z^{*}_{\alpha}-\bar{Z})

with zαz^{*}_{\alpha} denoting the α\alpha-quantile of the bootstrap estimates. It’s worth noting that there exist multiple methods to compute bootstrap confidence intervals, including the basic, percentile, and bias-corrected approaches, and the method described above serves as a general illustration.

Appendix C Sensitivity analysis for Δ(h)\Delta(h)

Recall from Proposition 3.3 that the lower confidence intervals for τfair\tau^{\ast}_{\textup{fair}} include a Δ(h)\Delta(h) term which is defined as

Δ(h)Φfair(h)τfair(acc(h))0.\Delta(h)\coloneqq\Phi_{\textup{fair}}(h)-\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))\geq 0.

In other words, Δ(h)\Delta(h) quantifies how ‘far’ the fairness loss of classifier hh (i.e. Φfair(h)\Phi_{\textup{fair}}(h)) is from the minimum attainable fairness loss for classifiers with accuracy acc(h)\textup{acc}(h), (i.e. τfair(acc(h))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))). This quantity is unknown in general and therefore, a practical strategy of obtaining lower confidence intervals on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) may involve positing values for Δ(h)\Delta(h) which encode our belief on how close the fairness loss Φfair(h)\Phi_{\textup{fair}}(h) is to τfair(acc(h))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)). For example, when we assume that the classifier hh achieves the optimal accuracy-fairness tradeoff, i.e. Φfair(h)=τfair(acc(h))\Phi_{\textup{fair}}(h)=\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)) then Δ(h)=0\Delta(h)=0.

However, the assumption Φfair(h)=τfair(acc(h))\Phi_{\textup{fair}}(h)=\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)) may not hold in general because we only have a finite training dataset and consequently the empirical loss minimisation may not yield the optima to the true expected loss. Moreover, the regularised loss used in training hh is a surrogate loss which approximates the solution to the constrained minimisation problem in Eq. (1). This means that optimising this regularised loss is not guaranteed to yield the optimal classifier which achieves the optimal fairness τfair(acc(h))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h)). Therefore, to incorporate any belief on the sub-optimality of the classifier hh, we may consider conducting sensitivity analyses to plausibly quantify Δ(h)\Delta(h).

Let hθ:𝒳×Λh_{\theta}:\mathcal{X}\times\Lambda\rightarrow\mathbb{R} be the YOTO model. Our strategy for sensitivity analysis involves training multiple standard models {h(1),h(2),,h(k)}\mathcal{M}\coloneqq\{h^{(1)},h^{(2)},\ldots,h^{(k)}\}\subseteq\mathcal{H} by optimising the regularised losses for few different choices of λ\lambda.

λ(θ)=𝔼[lCE(hθ(X),Y)]+λfair(hθ).\displaystyle\mathcal{L}_{\lambda}(\theta)=\mathbb{E}[l_{\textup{CE}}(h_{\theta}(X),Y)]+\lambda\,\mathcal{L}_{\textup{fair}}(h_{\theta}).

Importantly, we do not require covering the full range of λ\lambda values when training separate models \mathcal{M}, and our methodology remains valid even when \mathcal{M} is a single model. Next, let hλ{hθ(,λ)}h^{\ast}_{\lambda}\in\mathcal{M}\cup\{h_{\theta}(\cdot,\lambda)\} be such that

hλ=argminh{hθ(,λ)}Φfair^(h)subject to acc^(h)acc^(hθ(,λ)).\displaystyle h^{\ast}_{\lambda}=\operatorname*{arg\,min}_{h^{\prime}\in\mathcal{M}\cup\{h_{\theta}(\cdot,\lambda)\}}\widehat{\Phi_{\textup{fair}}}(h^{\prime})\quad\textup{subject to }\quad\widehat{\textup{acc}}(h^{\prime})\geq\widehat{\textup{acc}}(h_{\theta}(\cdot,\lambda)). (11)

Here, Φfair^\widehat{\Phi_{\textup{fair}}} and acc^\widehat{\textup{acc}} denote the finite sample estimates of the fairness loss and model accuracy respectively. We treat the model hλh^{\ast}_{\lambda} as a model which attains the optimum trade-off when estimating subject to the constraint acc(h)acc(hθ(,λ))\textup{acc}(h)\geq\textup{acc}(h_{\theta}(\cdot,\lambda)). Specifically, we use the maximum empirical error maxλΔ^(hθ(,λ))\max_{\lambda}\widehat{\Delta}(h_{\theta}(\cdot,\lambda)) as a plausible surrogate value for Δ(hθ(,λ))\Delta(h_{\theta}(\cdot,\lambda^{\prime})), where Δ^(hθ(,λ))Φfair^(hθ(,λ))Φfair^(hλ)0\widehat{\Delta}(h_{\theta}(\cdot,\lambda))\coloneqq\widehat{\Phi_{\textup{fair}}}(h_{\theta}(\cdot,\lambda))-\widehat{\Phi_{\textup{fair}}}(h^{\ast}_{\lambda})\geq 0 , i.e., we posit for any λΛ\lambda^{\prime}\in\Lambda

Δ(hθ(,λ))maxλΛΔ^(hθ(,λ))where,Δ^(hθ(,λ))Φfair^(hθ(,λ))Φfair^(hλ).\Delta(h_{\theta}(\cdot,\lambda^{\prime}))\leftarrow\max_{\lambda\in\Lambda}\widehat{\Delta}(h_{\theta}(\cdot,\lambda))\qquad\textup{where,}\qquad\widehat{\Delta}(h_{\theta}(\cdot,\lambda))\coloneqq\widehat{\Phi_{\textup{fair}}}(h_{\theta}(\cdot,\lambda))-\widehat{\Phi_{\textup{fair}}}(h^{\ast}_{\lambda}).

Next, we can use this posited value of Δ(hθ(,λ))\Delta(h_{\theta}(\cdot,\lambda^{\prime})) to construct the lower confidence interval using the following corollary of Proposition 3.3:

Corollary C.1.

Consider the YOTO model hθ:𝒳×Λh_{\theta}:\mathcal{X}\times\Lambda\rightarrow\mathbb{R}. Given λ0Λ\lambda_{0}\in\Lambda, let Uacch,Lfairh[0,1]U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}\in[0,1] be such that

(acc(hθ(,λ0))Uacch)1α/2and(Φfair(hθ(,λ0))Lfairh)1α/2.\mathbb{P}(\textup{acc}(h_{\theta}(\cdot,\lambda_{0}))\leq U^{h}_{\textup{acc}})\geq 1-\alpha/2\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h_{\theta}(\cdot,\lambda_{0}))\geq L^{h}_{\textup{fair}})\geq 1-\alpha/2.

Then, we have that (τfair(Uacch)LfairhΔ(hθ(,λ0)))1α.\mathbb{P}(\tau^{\ast}_{\textup{fair}}(U^{h}_{\textup{acc}})\geq L^{h}_{\textup{fair}}-\Delta(h_{\theta}(\cdot,\lambda_{0})))\geq 1-\alpha.

This result shows that if the goal is to construct lower confidence intervals on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi) and we obtain that ψUacch\psi\geq U^{h}_{\textup{acc}}, then using the monotonicity of τfair\tau^{\ast}_{\textup{fair}} we have that τfair(ψ)τfair(Uacch)\tau^{\ast}_{\textup{fair}}(\psi)\geq\tau^{\ast}_{\textup{fair}}(U^{h}_{\textup{acc}}). Therefore the interval [LfairhΔ(hθ(,λ0))),1][L^{h}_{\textup{fair}}-\Delta(h_{\theta}(\cdot,\lambda_{0}))),1] serves as a lower confidence interval for τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi).

When YOTO satisfies Pareto optimality, Δ(hθ(,λ))0\Delta(h_{\theta}(\cdot,\lambda))\rightarrow 0 as |𝒟cal||\mathcal{D}_{\textup{cal}}|\rightarrow\infty: Here, we show that in the case when YOTO achieves the optimal trade-off, then our sensitivity analysis leads to Δ(hθ(,λ))=0\Delta(h_{\theta}(\cdot,\lambda))=0 as the calibration data size increases for all λΛ\lambda\in\Lambda. Our arguments in this section are not formal, however, this idea can be formalised without any significant difficulty.

First, the concept of Pareto optimality (defined below) formalises the idea that YOTO achieves the optimal trade-off:

Assumption C.2 (Pareto optimality).
If for some λΛ and h we have that, acc(h)acc(hθ(,λ)) then, Φfair(h)Φfair(hθ(,λ)),\displaystyle\textup{If for some }\lambda\in\Lambda\textup{ and }h^{\prime}\in\mathcal{H}\textup{ we have that, }\,\textup{acc}(h^{\prime})\geq\textup{acc}(h_{\theta}(\cdot,\lambda))\,\textup{ then, }\,\Phi_{\textup{fair}}(h^{\prime})\geq\Phi_{\textup{fair}}(h_{\theta}(\cdot,\lambda)),

In the case when YOTO satisfies this optimality property, then it is straightforward to see that Δ(hθ(,λ))=0\Delta(h_{\theta}(\cdot,\lambda))=0 for all λΛ\lambda\in\Lambda. In this case, as 𝒟cal\mathcal{D}_{\textup{cal}}\rightarrow\infty, we get that Eq. (11) roughly becomes

hλ=argminh{hθ(,λ)}Φfair(h)subject to acc(h)acc(hθ(,λ)).\displaystyle h^{\ast}_{\lambda}=\operatorname*{arg\,min}_{h^{\prime}\in\mathcal{M}\cup\{h_{\theta}(\cdot,\lambda)\}}\Phi_{\textup{fair}}(h^{\prime})\quad\textup{subject to }\quad\textup{acc}(h^{\prime})\geq\textup{acc}(h_{\theta}(\cdot,\lambda)).

Here, Assumption C.2 implies that hλ=hθ(,λ)h^{\ast}_{\lambda}=h_{\theta}(\cdot,\lambda), and therefore

Δ^(hθ(,λ))Φfair^(hθ(,λ))Φfair^(hλ)=0.\widehat{\Delta}(h_{\theta}(\cdot,\lambda))\coloneqq\widehat{\Phi_{\textup{fair}}}(h_{\theta}(\cdot,\lambda))-\widehat{\Phi_{\textup{fair}}}(h^{\ast}_{\lambda})=0.
Intuition behind our sensitivity analysis procedure

Intuitively, the high-level idea behind our sensitivity analysis is that it checks if we train models separately for fixed values of λ\lambda (i.e. models in \mathcal{M}), how much better do these separately trained models perform in terms of the accuracy-fairness trade-offs as compared to our YOTO model. If we find that the separately trained models achieve a better trade-off than the YOTO model for specific values of λ\lambda, then the sensitivity analysis adjusts the empirical trade-off obtained using YOTO models (using the Δ^(hθ(,λ))\widehat{\Delta}(h_{\theta}(\cdot,\lambda)) term defined above). If, on the other hand, we find that the YOTO model achieves a better trade-off than the separately trained models in \mathcal{M}, then the sensitivity analysis has no effect on the lower confidence intervals as in this case Δ^(hθ(,λ))=0\widehat{\Delta}(h_{\theta}(\cdot,\lambda))=0.

C.1 Experimental results

Refer to caption
Refer to caption
(a) No sensitivity analysis
Refer to caption
(b) ||=2|\mathcal{M}|=2
Refer to caption
(c) ||=5|\mathcal{M}|=5
Figure 4: CIs with and without sensitivity analysis for Adult dataset for EO violation.
Refer to caption
(d) No sensitivity analysis
Refer to caption
(e) ||=2|\mathcal{M}|=2
Refer to caption
(f) ||=5|\mathcal{M}|=5
Figure 5: CIs with and without sensitivity analysis for Adult dataset for DP violation.
Refer to caption
(a) No sensitivity analysis
Refer to caption
(b) ||=2|\mathcal{M}|=2
Refer to caption
(c) ||=5|\mathcal{M}|=5
Figure 6: CIs with and without sensitivity analysis for COMPAS dataset for EO violation. Here, sensitivity analysis has no effect on the constructed CIs as the YOTO model achieves a better empirical trade-off than the separately trained models.

Here, we include empirical results showing how the CIs constructed change as a result of our sensitivity analysis procedure. In Figures 6 and 6, we include examples of CIs where the empirical trade-off obtained using YOTO is sub-optimal. In these cases, the lower CIs obtained without sensitivity analysis (i.e. when we assume Δ(hλ)=0\Delta(h_{\lambda})=0) do not cover the empirical trade-offs for the separately trained models. However, the figures show that the sensitivity analysis procedure adjusts the lower CIs in both cases so that they encapsulate the empirical trade-offs that were not captured without sensitivity analysis.

Recall that \mathcal{M} represents the set of additional separately trained models used for the sensitivity analysis. It can be seen from Figures 6 and 6 that in both cases our sensitivity analysis performs well with as little as two models (i.e. ||=2|\mathcal{M}|=2), which shows that our sensitivity analysis does not come at a high computational cost.

Tables 2 and 3 contain results corresponding to these figures and show the proportion of trade-offs which lie in the three trade-off regions shown in Figure 1 with and without sensitivity analysis. It can be seen that in both tables, when ||=2|\mathcal{M}|=2, the proportion of trade-offs which lie below the lower CIs (blue region in Figure 1) is negligible.

Additionally, in Figure 6 we also consider an example where YOTO achieves a better empirical trade-off than most other baselines considered, and therefore there is no need for sensitivity analysis. In this case, Figure 6 (and Table 4) show that sensitivity analysis has no effect on the CIs constructed since in this case sensitivity analysis gives us Δ^(hθ(,λ))=0\widehat{\Delta}(h_{\theta}(\cdot,\lambda))=0 for λΛ\lambda\in\Lambda. This shows that in cases where sensitivity analysis is not needed (for example, if YOTO achieves optimal empirical trade-off), our sensitivity analysis procedure does not make the CIs more conservative.

Table 2: Results for the Adult dataset and EO fairness violation with and without sensitivity analysis: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions (using Bootstrap CIs).
Baseline Sub-optimal Unlikely Permissible Unlikely Permissible Unlikely Permissible
(||=0|\mathcal{M}|=0) (||=0|\mathcal{M}|=0) (||=2|\mathcal{M}|=2) (||=2|\mathcal{M}|=2) (||=5|\mathcal{M}|=5) (||=5|\mathcal{M}|=5)
KDE-fair 0.00 0.15 0.85 0.00 1.00 0.00 1.00
RTO 0.60 0.00 0.40 0.00 0.40 0.00 0.40
adversary 0.64 0.00 0.36 0.00 0.36 0.00 0.36
linear 0.60 0.00 0.40 0.00 0.40 0.00 0.40
logsig 0.35 0.00 0.65 0.00 0.65 0.00 0.65
reductions 0.93 0.00 0.07 0.00 0.07 0.00 0.07
separate 0.00 0.54 0.46 0.00 1.00 0.00 1.00
Table 3: Results for the Adult dataset and DP fairness violation with and without sensitivity analysis: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions (using Bootstrap CIs).
Baseline Sub-optimal Unlikely Permissible Unlikely Permissible Unlikely Permissible
(||=0|\mathcal{M}|=0) (||=0|\mathcal{M}|=0) (||=2|\mathcal{M}|=2) (||=2|\mathcal{M}|=2) (||=5|\mathcal{M}|=5) (||=5|\mathcal{M}|=5)
KDE-fair 0.03 0.10 0.87 0.00 0.97 0.00 0.97
RTO 0.67 0.00 0.33 0.00 0.33 0.00 0.33
adversary 0.91 0.00 0.09 0.00 0.09 0.00 0.09
linear 0.40 0.33 0.27 0.00 0.60 0.00 0.60
logsig 0.73 0.05 0.23 0.00 0.27 0.00 0.27
reductions 0.87 0.00 0.13 0.00 0.13 0.00 0.13
separate 0.03 0.25 0.71 0.00 0.97 0.00 0.97
Table 4: Results for the COMPAS dataset and EO fairness violation with and without sensitivity analysis: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions (using Bootstrap CIs).
Baseline Sub-optimal Unlikely Permissible Unlikely Permissible Unlikely Permissible
(||=0|\mathcal{M}|=0) (||=0|\mathcal{M}|=0) (||=2|\mathcal{M}|=2) (||=2|\mathcal{M}|=2) (||=5|\mathcal{M}|=5) (||=5|\mathcal{M}|=5)
KDE-fair 0.00 0.00 1.00 0.00 1.00 0.00 1.00
RTO 0.15 0.00 0.85 0.00 0.85 0.00 0.85
adversary 0.00 0.00 1.00 0.00 1.00 0.00 1.00
linear 0.21 0.00 0.79 0.00 0.79 0.00 0.79
logsig 0.55 0.00 0.45 0.00 0.45 0.00 0.45
reductions 0.30 0.00 0.70 0.00 0.70 0.00 0.70
separate 0.03 0.80 0.17 0.80 0.17 0.80 0.17

Appendix D Scarce sensitive attributes

Our methodology of obtaining confidence intervals on Φfair\Phi_{\textup{fair}} assumes access to the sensitive attributes AA for all data points in the held-out dataset 𝒟\mathcal{D}. However, in practice, we may only have access to AA for a small proportion of the data in 𝒟\mathcal{D}. In this case, a naïve strategy would involve constructing confidence intervals using only the data for which AA is available. However, since such data is scarce, the confidence intervals constructed are very loose.

Suppose that we additionally have access to a predictive model f𝒜f_{\mathcal{A}} which predicts the sensitive attributes AA using the features XX. In this case, another simple strategy would be to simply impute the missing values of AA, with the values A^\hat{A} predicted using f𝒜f_{\mathcal{A}}. However, this will usually lead to a biased estimate of the fairness violation Φfair(h)\Phi_{\textup{fair}}(h), and hence is not very reliable unless the model f𝒜f_{\mathcal{A}} is highly accurate. In this section, we show how to get the best of both worlds, i.e. how to utilise the data with missing sensitive attributes to obtain tighter and more accurate confidence intervals on τfair(ψ)\tau^{\ast}_{\textup{fair}}(\psi).

Formally, we consider 𝒟cal=𝒟𝒟~\mathcal{D}_{\textup{cal}}=\mathcal{D}\cup\tilde{\mathcal{D}} where 𝒟\mathcal{D} denotes a data subset of size nn that contains sensitive attributes (i.e. we observe AA) and 𝒟~\tilde{\mathcal{D}} denotes the data subset of size NN for which we do not observe the sensitive attributes AA, and NnN\gg n. Additionally, for both datasets, we have predictions of the sensitive attributes made by a machine-learning algorithm f𝒜:𝒳𝒜f_{\mathcal{A}}:\mathcal{X}\rightarrow\mathcal{A}, where f𝒜(X)Af_{\mathcal{A}}(X)\approx A. Concretely we have that 𝒟={(Xi,Ai,Yi,f𝒜(Xi))}i=1n\mathcal{D}=\{(X_{i},A_{i},Y_{i},f_{\mathcal{A}}(X_{i}))\}_{i=1}^{n} and 𝒟~={(X~i,Y~i,f𝒜(X~i))}i=1N\tilde{\mathcal{D}}=\{(\tilde{X}_{i},\tilde{Y}_{i},f_{\mathcal{A}}(\tilde{X}_{i}))\}_{i=1}^{N}

High-level methodology

Our methodology is inspired by prediction-powered inference [4] which builds confidence intervals on the expected outcome 𝔼[Y]\mathbb{E}[Y] using data for which the true outcome YY is only available for a small proportion of the dataset. In our setting, however, it is the sensitive attribute AA that is missing for the majority of the data (and not the outcome YY).

For hh\in\mathcal{H}, let Φfair(h)\Phi_{\textup{fair}}(h) be a fairness violation (such as DP or EO), and let Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) be the corresponding fairness violation computed on the data distribution where AA is replaced by the surrogate sensitive attribute f𝒜(X)f_{\mathcal{A}}(X). For example, in the case of DP violation, Φfair(h)\Phi_{\textup{fair}}(h) and Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) denote:

Φfair(h)\displaystyle\Phi_{\textup{fair}}(h) =|(h(X)=1A=1)(h(X)=1A=0)|,\displaystyle=|\mathbb{P}(h(X)=1\mid A=1)-\mathbb{P}(h(X)=1\mid A=0)|,
Φfair~(h)\displaystyle\widetilde{\Phi_{\textup{fair}}}(h) =|(h(X)=1f𝒜(X)=1)(h(X)=1f𝒜(X)=0)|.\displaystyle=|\mathbb{P}(h(X)=1\mid f_{\mathcal{A}}(X)=1)-\mathbb{P}(h(X)=1\mid f_{\mathcal{A}}(X)=0)|.

We next construct the confidence intervals on Φfair(h)\Phi_{\textup{fair}}(h) using the following steps:

  1. 1.

    Using 𝒟\mathcal{D}, we construct intervals Cϵ(α;h)C_{\epsilon}(\alpha;h) on ϵ(h)Φfair(h)Φfair~(h)\epsilon(h)\coloneqq\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h) satisfying

    (ϵ(h)Cϵ(α;h))1α.\displaystyle\mathbb{P}(\epsilon(h)\in C_{\epsilon}(\alpha;h))\geq 1-\alpha. (12)

    Even though the size of 𝒟\mathcal{D} is small, we choose a methodology which yields tight intervals for ϵ(h)\epsilon(h) when f𝒜(Xi)=Aif_{\mathcal{A}}(X_{i})=A_{i} with a high probability.

  2. 2.

    Next, using the dataset 𝒟~\tilde{\mathcal{D}}, we construct intervals C~f(α;h)\tilde{C}_{\textup{f}}(\alpha;h) on Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) satisfying

    (Φfair~(h)C~f(α;h))1α.\displaystyle\mathbb{P}(\widetilde{\Phi_{\textup{fair}}}(h)\in\tilde{C}_{\textup{f}}(\alpha;h))\geq 1-\alpha. (13)

    This interval will also be tight as the size of 𝒟~\tilde{\mathcal{D}}, NnN\gg n.

Finally, using the union bound idea we combine the two confidence intervals to obtain the confidence interval for Φfair(h)Φfair~(h)+Φfair~(h)=Φfair(h)\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h)+\widetilde{\Phi_{\textup{fair}}}(h)=\Phi_{\textup{fair}}(h). We make this precise in the following result:

Lemma D.1.

Let Cϵ(α;h),C~f(α;h)C_{\epsilon}(\alpha;h),\tilde{C}_{\textup{f}}(\alpha;h) be as defined in equations 12 and 13. Then, if we define Cfairα(h)={x+y|xCϵ(α;h),yC~f(α;h)}C_{\textup{fair}}^{\alpha}(h)=\{x+y\,|\,x\in C_{\epsilon}(\alpha;h),y\in\tilde{C}_{\textup{f}}(\alpha;h)\}, we have that

(Φfair(h)Cfairα(h))12α.\mathbb{P}(\Phi_{\textup{fair}}(h)\in C_{\textup{fair}}^{\alpha}(h))\geq 1-2\alpha.

When constructing the CIs over Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) using imputed sensitive attributes f𝒜(X)f_{\mathcal{A}}(X) in step 2 above, the prediction error of f𝒜f_{\mathcal{A}} introduces an error in the obtained CIs (denoted by ϵ(h)\epsilon(h)). Step 1 rectifies this by constructing a CI over the incurred error ϵ(h)\epsilon(h), and therefore combining the two allows us to obtain intervals which utilise all of the available data while ensuring that the constructed CIs are well-calibrated.

Example: Demographic parity

Having defined our high-level methodology above, we concretely demonstrate how this can be applied to the case where the fairness loss under consideration is DP. As described above, the first step involves constructing intervals on ϵ(h)Φfair(h)Φfair~(h)\epsilon(h)\coloneqq\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h) using a methodology which yields tight intervals when f𝒜(Xi)=Aif_{\mathcal{A}}(X_{i})=A_{i} with a high probability. To this end, we use bootstrapping as described in Algorithm 1.

Even though bootstrapping does not provide us with finite sample coverage guarantees, it is asymptotically exact and satisfies the property that the confidence intervals are tight when A^=A\hat{A}=A with a high probability. On the other hand, concentration inequalities (such as Hoeffding’s inequality) seek to construct confidence intervals individually on Φfair(h)\Phi_{\textup{fair}}(h) and Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) and subsequently combine them through union bounds argument, for example. In doing so, these methods do not account for how close the values of Φfair(h)\Phi_{\textup{fair}}(h) and Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) might be in the data.

To make this concrete, consider the example where f𝒜(X)=a.s.Af_{\mathcal{A}}(X)\overset{\text{a.s.}}{=}A and hence Φfair(h)=Φfair~(h)\Phi_{\textup{fair}}(h)=\widetilde{\Phi_{\textup{fair}}}(h). When using concentration inequalities to construct the 1α1-\alpha confidence intervals on Φfair(h)\Phi_{\textup{fair}}(h) and Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h), we obtain identical intervals for the two quantities, say [l,u][l,u]. Then, using union bounds we obtain that Φfair(h)Φfair~(h)[lu,ul]\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h)\in[l-u,u-l] with probability at least 12α1-2\alpha. In this case even though Φfair(h)Φfair~(h)=0\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h)=0, the width of the interval [lu,ul][l-u,u-l] does not depend on the closeness of Φfair(h)\Phi_{\textup{fair}}(h) and Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) and therefore is not tight. Bootstrapping helps us circumvent this problem, since in this case for each resample of the data 𝒟\mathcal{D}, the finite sample estimates Φfair(h)^\widehat{\Phi_{\textup{fair}}(h)} and Φfair~(h)^\widehat{\widetilde{\Phi_{\textup{fair}}}(h)} will be equal. We outline the bootstrapping algorithm below.

Algorithm 1 Bootstrapping for estimating ϵ(h)Φfair(h)Φfair~(h)\epsilon(h)\coloneqq\Phi_{\textup{fair}}(h)-\widetilde{\Phi_{\textup{fair}}}(h)
0:  Dataset 𝒟\mathcal{D}, number of bootstrap samples BB, significance level α\alpha
0:  1α1-\alpha confidence interval for ϵ(h)\epsilon(h)
  Initialize empty array vb\textbf{v}_{b}
  for i=1i=1 to BB do
     Draw a bootstrap sample 𝒟\mathcal{D}^{*} of size |𝒟||\mathcal{D}| with replacement from 𝒟\mathcal{D}
     Compute Φfair(h)^\widehat{\Phi_{\textup{fair}}(h)} and Φfair~(h)^\widehat{\widetilde{\Phi_{\textup{fair}}}(h)} on 𝒟\mathcal{D}^{*}
     Compute the difference ϵ(h)^Φfair(h)^Φfair~(h)^\widehat{\epsilon(h)}\coloneqq\widehat{\Phi_{\textup{fair}}(h)}-\widehat{\widetilde{\Phi_{\textup{fair}}}(h)}
     Append ϵ(h)^\widehat{\epsilon(h)} to vb\textbf{v}_{b}
  end for
  Compute the α/2\alpha/2 and 1α/21-\alpha/2 quantiles of vb\textbf{v}_{b}, denoted as ll and uu
  Return: Confidence interval Cϵ(α;h)=[l,u]C_{\epsilon}(\alpha;h)=[l,u]

Using Algorithm 1 we construct a confidence interval Cϵ(α;h)C_{\epsilon}(\alpha;h) on ϵ(h)\epsilon(h) of size 1α1-\alpha, which approximately satisfies Eq. (12). Next, using standard techniques we can obtain an interval C~f(α;h)\tilde{C}_{\textup{f}}(\alpha;h) on Φfair~(h)\widetilde{\Phi_{\textup{fair}}}(h) using 𝒟~\tilde{\mathcal{D}} which satisfies Eq. (13). Like before, the interval C~f(α;h)\tilde{C}_{\textup{f}}(\alpha;h) is likely to be tight as we use 𝒟~\tilde{\mathcal{D}} to construct it, which is significantly larger than 𝒟\mathcal{D}. Finally, combining the two as shown in Lemma D.1, we obtain the confidence interval on Φfair(h)\Phi_{\textup{fair}}(h).

D.1 Experimental results

Refer to caption
Refer to caption
(a) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(b) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(c) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 7: CIs obtained by imputing missing senstive attributes using f𝒜f_{\mathcal{A}} for Adult dataset. Here n=50n=50 and N=2500N=2500.
Refer to caption
(d) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(e) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(f) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 8: CIs were obtained using our methodology for the Adult dataset. Here n=50n=50 and N=2500N=2500.
Refer to caption
(a) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(b) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(c) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 9: CIs obtained by imputing missing senstive attributes using f𝒜f_{\mathcal{A}} for COMPAS dataset. Here n=50n=50 and N=2000N=2000.
Refer to caption
(a) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(b) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(c) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 10: CIs obtained using our methodology for COMPAS dataset. Here n=50n=50 and N=2000N=2000.
Refer to caption
Refer to caption
(a) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(b) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(c) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 11: CIs obtained by imputing missing senstive attributes using f𝒜f_{\mathcal{A}} for CelebA dataset. Here n=50n=50 and N=2500N=2500.
Refer to caption
(d) acc(f𝒜)=50%\textup{acc}(f_{\mathcal{A}})=50\%
Refer to caption
(e) acc(f𝒜)=70%\textup{acc}(f_{\mathcal{A}})=70\%
Refer to caption
(f) acc(f𝒜)=90%\textup{acc}(f_{\mathcal{A}})=90\%
Figure 12: CIs obtained using our methodology for CelebA dataset. Here n=50n=50 and N=2500N=2500.

Here, we present experimental results in the setting where the sensitive attributes are missing for majority of the calibration data. Figures 10-12 show the results for different datasets and predictive models f𝒜f_{\mathcal{A}} with varying accuracies. Here, the empirical fairness violation values for both YOTO and separately trained models are evaluated using the true sensitive attributes over the entire calibration data.

CIs with imputed sensitive attributes are mis-calibrated

Figures 10, 10 and 12 show results for Adult, COMPAS and CelebA datasets, where the CIs are computed by imputing the missing sensitive attributes with the predicted sensitive attributes f𝒜(X)Af_{\mathcal{A}}(X)\approx A. The figures show that when the accuracy of f𝒜f_{\mathcal{A}} is below 90%, the CIs are highly miscalibrated as they do not entirely contain the empirical trade-offs for both YOTO and separately trained models.

Our methodology corrects for the mis-calibration

In contrast, Figures 10, 10 and 12 which include the corresponding results using our methodology, show that our methodology is able to correct for the mis-calibration in CIs arising from the prediction error in f𝒜f_{\mathcal{A}}. Even though the CIs obtained using our methodology are more conservative than those obtained by imputing the missing sensitive attributes with f𝒜(X)f_{\mathcal{A}}(X), they are more well-calibrated and contain the empirical trade-offs for both YOTO and separately trained model.

Imputing missing sensitive attributes may work when f𝒜f_{\mathcal{A}} has high accuracy

Finally, Figures 7(c), 8(c) and 11(c) show that the CIs with imputed sensitive attributes are relatively better calibrated as the accuracy of f𝒜f_{\mathcal{A}} increases to 90%. In this case, the CIs with imputed sensitive attributes mostly contain empirical trade-offs. This shows that in cases where the predictive model f𝒜f_{\mathcal{A}} has high accuracy, it may be sufficient to impute missing sensitive attributes with f𝒜(X)f_{\mathcal{A}}(X) when constructing the CIs.

Appendix E Accounting for the uncertainty in baseline trade-offs

In this section, we extend our methodology to also account for uncertainty in the baseline trade-offs when assessing the optimality of different baselines. Recall that our confidence intervals constructed on τfair\tau^{\ast}_{\textup{fair}} satisfy the guarantee

(τfair(Ψ)Γfairα)1α.\displaystyle\mathbb{P}(\tau^{\ast}_{\textup{fair}}(\Psi)\in\Gamma^{\alpha}_{\textup{fair}})\geq 1-\alpha.

This means that if the accuracy-fairness tradeoff for a given model hh^{\prime}\in\mathcal{H}, (acc(h),Φfair(h))(\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime})), lies above the confidence intervals Γfairα\Gamma^{\alpha}_{\textup{fair}} (i.e. in the pink region in Figure 13), then we can confidently infer that the model hh^{\prime} achieves a suboptimal trade-off. This is because we know from the probabilistic guarantee above that the optimal trade-off (acc(h),τfair(acc(h)))(\textup{acc}(h^{\prime}),\tau^{\ast}_{\textup{fair}}(\textup{acc}(h^{\prime}))) must lie in the intervals Γfairα\Gamma^{\alpha}_{\textup{fair}} with probability at least 1α1-\alpha.

Here, acc(h),Φfair(h)\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}) denote the accuracy and fairness violations for model hh^{\prime} on the full data distribution. However, in practice, we only have access to finite data and therefore can only compute the empirical values of accuracy and fairness violations which we denote by acc^(h),Φfair^(h)\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime}). This means that when checking if the accuracy-fairness trade-off (acc(h),Φfair(h))(\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime})) lies inside the confidence intervals Γfairα\Gamma^{\alpha}_{\textup{fair}}, we must account for the uncertainty in the empirical estimates acc^(h),Φfair^(h)\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime}). This can be achieved by constructing confidence regions Cα(h)C^{\alpha}(h^{\prime}) satisfying

((acc(h),Φfair(h))Cα(h))1α.\displaystyle\mathbb{P}((\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}))\in C^{\alpha}(h^{\prime}))\geq 1-\alpha. (14)
Baseline’s best-case accuracy-fairness trade-off

If the confidence region Cα(h)C^{\alpha}(h^{\prime}) lies entirely above the confidence intervals Γfairα\Gamma^{\alpha}_{\textup{fair}} (i.e. in the pink region in Figure 13(a)), then using union bounds we can confidently conclude that with probability 12α1-2\alpha we have that the model hh^{\prime} achieves suboptimal fairness violation, i.e.

Φfair(h)τfair(acc(h)).\displaystyle\Phi_{\textup{fair}}(h^{\prime})\geq\tau^{\ast}_{\textup{fair}}(\textup{acc}(h^{\prime})).

This allows practitioners to confidently check if a model hh^{\prime} is suboptimal in terms of its accuracy-fairness trade-off. This is different from simply checking if the empirical trade-off achieved by the model (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})) lies in the permissible trade-off region (green region in Figure 1) as it also accounts for the finite-sampling uncertainty in the tradeoff achieved by model hh^{\prime}. However, this means that the criterion for flagging a baseline model hh^{\prime} as suboptimal becomes more conservative.

Next, we show how to construct confidence regions Cα(h)C^{\alpha}(h^{\prime}) satisfying Eq. (14).

Lemma E.1.

For a classifier hh^{\prime}\in\mathcal{H}, let Uacch,LfairhU^{h}_{\textup{acc}},L^{h}_{\textup{fair}} be the upper and lower CIs on acc(h),Φfair(h)\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}) respectively,

(acc(h)Uacch)1α/2,and(Φfair(h)Lfairh)1α/2.\displaystyle\mathbb{P}(\textup{acc}(h^{\prime})\leq U^{h}_{\textup{acc}})\geq 1-\alpha/2,\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h^{\prime})\geq L^{h}_{\textup{fair}})\geq 1-\alpha/2.

Then, ((acc(h),Φfair(h))[0,Uacch]×[Lfairh,1])1α\mathbb{P}((\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}))\in[0,U^{h}_{\textup{acc}}]\times[L^{h}_{\textup{fair}},1])\geq 1-\alpha.

Lemma E.1 shows that Cα(h)=[0,Uacch]×[Lfairh,1]C^{\alpha}(h^{\prime})=[0,U^{h}_{\textup{acc}}]\times[L^{h}_{\textup{fair}},1] forms the 1α1-\alpha confidence region for the accuracy-fairness trade-off (acc(h),Φfair(h))(\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime})). We illustrate this in Figure 13(a). If this confidence region lies entirely above the permissible region (i.e. in the pink region in Figure 13), we can confidently conclude that the model hh^{\prime} achieves suboptimal accuracy-fairness trade-off. From Figure 13(a) it can be seen that this will occur if (Uacch,Lfairh)(U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) lies above the permissible region.

Refer to caption
Refer to caption
(a) Here, (Uacch,Lfairh)(U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) represents the best-case accuracy-fairness trade-off achieved by hh. If the confidence region lies entirely in the suboptimal region, we can confidently conclude that hh achieves a suboptimal trade-off.
Refer to caption
(b) Here, (Lacch,Ufairh)(L^{h}_{\textup{acc}},U^{h}_{\textup{fair}}) represents the worst-case accuracy-fairness trade-off achieved by hh. If the confidence region lies entirely in the ‘unlikely’ region, we can conclude that hh achieves a better trade-off than YOTO.
Figure 13: Confidence region Cα(h)C^{\alpha}(h) for the accuracy-fairness trade-off achieved by hh.

Intuitively, (Uacch,Lfairh)(U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) can be seen as an optimistic best-case accuracy-fairness trade-off achieved by the model hh^{\prime}, since at this point in the confidence region Cα(h)C^{\alpha}(h^{\prime}) the accuracy is maximised and fairness violation is minimised. Therefore if this best-case trade-off lies above the permissible region, then this intuitively indicates that the worst-case optimal trade-off τfair\tau^{\ast}_{\textup{fair}} is still better than the best-case trade-off achieved by the model hh^{\prime}, leading us to the confident conclusion that hh^{\prime} achieves suboptimal accuracy-fairness trade-off.

Baseline’s worst-case accuracy-fairness trade-off

Conversely, if the confidence region Cα(h)C^{\alpha}(h^{\prime}) lies entirely below the confidence intervals Γfairα\Gamma^{\alpha}_{\textup{fair}} (i.e. in the blue region in Figure 13), then using union bounds we can confidently conclude that with probability 12α1-2\alpha we have that the model hh^{\prime} achieves a better trade-off than the YOTO model hθ(,λ)h_{\theta}(\cdot,\lambda). Formally, this means that

τYOTO(acc(h))Φfair(h),\displaystyle\tau^{\textup{YOTO}}(\textup{acc}(h^{\prime}))\geq\Phi_{\textup{fair}}(h^{\prime}),

where

τYOTO(δ)minλΛΦfair(hθ(,λ))subject toacc(hθ(,λ))δ.\displaystyle\tau^{\textup{YOTO}}(\delta)\coloneqq\min_{\lambda\in\Lambda}\Phi_{\textup{fair}}(h_{\theta}(\cdot,\lambda))\quad\textup{subject to}\quad\textup{acc}(h_{\theta}(\cdot,\lambda))\geq\delta.

This would indicate that the YOTO model does not achieve the optimal trade-off and can be used to further calibrate the Δ(hλ)\Delta(h_{\lambda}) values when constructing the lower confidence interval using Proposition 3.3. Again, this is different from simply checking if the empirical trade-off achieved by the model (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})) lies in the unlikely-to-be-achieved trade-off region (blue region in Figure 13) as it also accounts for the finite-sampling uncertainty in the tradeoff achieved by model hh^{\prime}.

Next, we show to construct such confidence region Cα(h)C^{\alpha}(h^{\prime}) using an approach analogous to the one outlined above:

Lemma E.2.

For a classifier hh^{\prime}\in\mathcal{H}, let Lacch,UfairhL^{h}_{\textup{acc}},U^{h}_{\textup{fair}} be the lower and upper CIs on acc(h),Φfair(h)\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}) respectively,

(acc(h)Lacch)1α/2,and(Φfair(h)Ufairh)1α/2.\displaystyle\mathbb{P}(\textup{acc}(h^{\prime})\geq L^{h}_{\textup{acc}})\geq 1-\alpha/2,\quad\textup{and}\quad\mathbb{P}(\Phi_{\textup{fair}}(h^{\prime})\leq U^{h}_{\textup{fair}})\geq 1-\alpha/2.

Then, ((acc(h),Φfair(h))[Lacch,1]×[0,Ufairh])1α\mathbb{P}((\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime}))\in[L^{h}_{\textup{acc}},1]\times[0,U^{h}_{\textup{fair}}])\geq 1-\alpha.

Lemma E.2 shows that Cα(h)=[Lacch,1]×[0,Ufairh]C^{\alpha}(h^{\prime})=[L^{h}_{\textup{acc}},1]\times[0,U^{h}_{\textup{fair}}] forms the 1α1-\alpha confidence region for the accuracy-fairness trade-off (acc(h),Φfair(h))(\textup{acc}(h^{\prime}),\Phi_{\textup{fair}}(h^{\prime})). If this confidence region lies entirely below the permissible region (i.e. in the blue region in Figure 13), we can confidently conclude that the model hh^{\prime} achieves a better accuracy-fairness trade-off than the YOTO model. From Figure 13(b) it can be seen that this will occur if (Lacch,Ufairh)(L^{h}_{\textup{acc}},U^{h}_{\textup{fair}}) lies below the permissible region.

Intuitively, (Lacch,Ufairh)(L^{h}_{\textup{acc}},U^{h}_{\textup{fair}}) can be seen as a conservative worst-case accuracy-fairness trade-off achieved by the model hh^{\prime}. Therefore if this worst-case trade-off lies below the permissible region, then this intuitively indicates that the best-case YOTO trade-off τYOTO\tau^{\textup{YOTO}} is still worse than the worst-case trade-off achieved by the model hh^{\prime}, leading us to the confident conclusion that hh^{\prime} achieves a better accuracy-fairness trade-off than the YOTO model. This can subsequently be used to calibrate the suboptimality gap for the YOTO model, denoted by Δ(hλ)\Delta(h_{\lambda}) in Proposition 3.3.

E.1 Experimental results

Here, we present the results with the empirical baseline trade-offs replaced by best or worst-case trade-offs as appropriate. More specifically, in Figure 14,

  • if the empirical trade-off for a baseline (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})) lies in the suboptimal region (as in Figure 13(a)), then we plot the best-case trade-off (Uacch,Lfairh)(U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) for the baseline,

  • if the empirical trade-off for a baseline (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})) lies in the unlikely-to-be-achievable trade-off region (as in Figure 13(b)), then we plot the worst-case trade-off (Lacch,Ufairh)(L^{h}_{\textup{acc}},U^{h}_{\textup{fair}}) for the baseline,

  • if the empirical trade-off for a baseline (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})) lies in the permissible region, then we simply plot the empirical trade-off (acc^(h),Φfair^(h))(\widehat{\textup{acc}}(h^{\prime}),\widehat{\Phi_{\textup{fair}}}(h^{\prime})).

Therefore, in Figure 14 if a baseline’s best-case trade-off lies above the permissible trade-off region, then we can confidently conclude that the baseline achieves suboptimal accuracy-fairness trade-off with probability 12α=90%1-2\alpha=90\%. Similarly, a baseline’s worst-case trade-off lying below the permissible trade-off region would suggest that the YOTO trade-off achieves a suboptimal trade-off and that the value of Δ(hθ)\Delta(h_{\theta}) needs to be adjusted accordingly.

Table 5 shows the proportion of best-case trade-offs which lie above the permissible trade-off region and the proportion of worst-case trade-offs which lie below the permissible region. Firstly, the table shows that the proportion of worst-case trade-offs which lie in the ‘unlikely’ region is negligible, empirically confirming that our confidence intervals on optimal trade-off τfair\tau^{\ast}_{\textup{fair}} are indeed valid. Secondly, we can see that there are a considerable proportion of baselines whose best-case trade-off lies above the permissible region, highlighting that our methodology remains effective in flagging suboptimalities in SOTA baselines even when we account for the possible uncertainty in baseline trade-offs. This shows that our methodology yields CIs which are not only reliable but also informative.

Refer to caption
(a)
Adult dataset    COMPAS dataset   CelebA dataset   Jigsaw dataset
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Refer to caption
(l)
Refer to caption
(m)
Figure 14: Results with the empirical baseline trade-offs replaced by best-case and worst-case trade-offs for baselines lying above and below the permissible region respectively. Here, 𝒟cal\mathcal{D}_{\textup{cal}} is 10% data split and α=0.05\alpha=0.05.
Table 5: The table shows the proportion of models whose best-case trade-off (Uacch,Lfairh)(U^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) lies in the suboptimal region, models whose worst-case trade-off (Lacch,Lfairh)(L^{h}_{\textup{acc}},L^{h}_{\textup{fair}}) lies in the unlikely region and models whose confidence regions Cα(h)C^{\alpha}(h) overlaps with the permissible trade-off region. The table reports average across all experiments.
Baseline Unlikely Permissible Sub-optimal
KDE-fair 0.0 ±\pm 0.0 1.0±\pm0.0 0.0±\pm0.0
RTO 0.0 ±\pm 0.0 0.77 ±\pm0.05 0.23±\pm 0.04
adversary 0.0 ±\pm 0.0 0.79 ±\pm0.1 0.2±\pm0.08
linear 0.08±\pm0.05 0.81±\pm0.06 0.11±\pm0.05
logsig 0.05 ±\pm0.02 0.71±\pm 0.09 0.25±\pm0.1
reductions 0.0±\pm0.0 0.82±\pm 0.09 0.18 ±\pm0.08
separate 0.03±\pm0.02 0.95±\pm 0.08 0.02 ±\pm0.03

Appendix F Experimental details and additional results

In this section, we provide greater details regarding our experimental setup and models used. We first begin by defining the Equalized Odds metric which has been used in our experiments, along with DP and EOP.

Equalized Odds (EO):

EO condition states that, both the true positive rates and false positive rates for all sensitive groups are equal, i.e. (h(X)=1A=a,Y=y)=(h(X)=1Y=y)\mathbb{P}(h(X)=1\mid A=a,Y=y)=\mathbb{P}(h(X)=1\mid Y=y) for any a𝒜a\in\mathcal{A} and y{0,1}y\in\{0,1\}. The absolute EO violation is defined as:

ΦEO(h)\displaystyle\Phi_{\textup{EO}}(h)\coloneqq 1/2|(h(X)=1A=1,Y=1)(h(X)=1A=0,Y=1)|\displaystyle 1/2\,|\mathbb{P}(h(X)=1\mid A=1,Y=1)-\mathbb{P}(h(X)=1\mid A=0,Y=1)|
+1/2|(h(X)=1A=1,Y=0)(h(X)=1A=0,Y=0)|.\displaystyle+1/2\,|\mathbb{P}(h(X)=1\mid A=1,Y=0)-\mathbb{P}(h(X)=1\mid A=0,Y=0)|.

Next, we provide additional details regarding the YOTO model.

F.1 Practical details regarding YOTO model

As described in Section 3, we consider optimising regularized losses of the form

λ(θ)=𝔼[lCE(hθ(X),Y)]+λfair(hθ).\displaystyle\mathcal{L}_{\lambda}(\theta)=\mathbb{E}[l_{\textup{CE}}(h_{\theta}(X),Y)]+\lambda\,\mathcal{L}_{\textup{fair}}(h_{\theta}).

When training YOTO models, instead of fixing λ\lambda, we sample the parameter λ\lambda from a distribution PλP_{\lambda}. As a result, during training the model observes many different values of λ\lambda and learns to optimise the loss λ\mathcal{L}_{\lambda} for all of them simultaneously. At inference time, the model can be conditioned on a chosen parameter value λ\lambda^{\prime} and recovers the model trained to optimise λ\mathcal{L}_{\lambda^{\prime}}. The loss being minimised can thus be expressed as follows:

argminhθ:𝒳×Λ𝔼λPλ[𝔼[lCE(hθ(X,λ),Y)]+λfair(hθ(,λ))].\displaystyle\operatorname*{arg\,min}_{h_{\theta}:\mathcal{X}\times\Lambda\rightarrow\mathbb{R}}\mathbb{E}_{\lambda\sim P_{\lambda}}\left[\mathbb{E}[l_{\textup{CE}}(h_{\theta}(X,\lambda),Y)]+\lambda\,\mathcal{L}_{\textup{fair}}(h_{\theta}(\cdot,\lambda))\right].

The fairness losses fair\mathcal{L}_{\textup{fair}} considered for the YOTO model are:

DP:fair(hθ(,λ))=|𝔼[σ(hθ(X,λ))A=1]𝔼[σ(hθ(X,λ))A=0]|\displaystyle\textup{DP:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta}(\cdot,\lambda))=|\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=1]-\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=0]|
EOP:fair(hθ(,λ))=|𝔼[σ(hθ(X,λ))A=1,Y=1]𝔼[σ(hθ(X,λ))A=0,Y=1]|\displaystyle\textup{EOP:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta}(\cdot,\lambda))=|\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=1,Y=1]-\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=0,Y=1]|
EO:fair(hθ(,λ))=|𝔼[σ(hθ(X,λ))A=1,Y=1]𝔼[σ(hθ(X,λ))A=0,Y=1]|\displaystyle\textup{EO:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta}(\cdot,\lambda))=|\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=1,Y=1]-\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=0,Y=1]|
+|𝔼[σ(hθ(X,λ))A=1,Y=0]𝔼[σ(hθ(X,λ))A=0,Y=0]|.\displaystyle\qquad+|\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=1,Y=0]-\mathbb{E}[\sigma(h_{\theta}(X,\lambda))\mid A=0,Y=0]|.

Here, σ(x)1/(1+ex)\sigma(x)\coloneqq 1/(1+e^{-x}) denotes the sigmoid function.

In our experiments, we sample a new λ\lambda for every batch. Moreover, we use the log-uniform distribution as per [15] as the sampling distribution PλP_{\lambda}, where the uniform distribution is U[106,10]U[10^{-6},10]. To condition the network on λ\lambda parameters, we follow in the footsteps of [15] to use Feature-wise Linear Modulation (FiLM) [34]. For completeness, we include the description of the architecture next.

Initially, we determine which network layers should be conditioned, which can encompass all layers or just a subset. For each chosen layer, we condition it based on the weight parameters λ\lambda. Given a layer that yields a feature map ff with dimensions W×H×CW\times H\times C, where WW and HH denote the spatial dimensions and CC stands for the channels, we introduce the parameter vector λ\lambda to two distinct multi-layer perceptrons (MLPs), denoted as MσM_{\sigma} and MμM_{\mu}. These MLPs produce two vectors, σ\sigma and μ\mu, each having a dimensionality of CC. The feature map is then transformed by multiplying it channel-wise with σ\sigma and subsequently adding μ\mu. The resultant transformed feature map ff^{\prime} is given by:

fijk=σkfijk+μkwhereσ=Mσ(λ)andμ=Mμ(λ).f^{\prime}_{ijk}=\sigma_{k}f_{ijk}+\mu_{k}\quad\text{where}\quad\sigma=M_{\sigma}(\lambda)\quad\text{and}\quad\mu=M_{\mu}(\lambda).

Next, we provide exact architectures we used for each dataset in our experiments.

F.1.1 YOTO Architectures

Adult and COMPAS dataset

Here, we use a simple logistic regression as the main model, with only the scalar logit outputs of the logistic regression being conditioned using FiLM. The MLPs Mμ,MσM_{\mu},M_{\sigma} both have two hidden layers, each of size 4, and ReLU activations. We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. Training these simple models takes roughly 5 minutes on a Tesla-V100-SXM2-32GB GPU.

CelebA dataset

For the CelebA dataset, our architecture is a convolutional neural network (ConvNet) integrated with the FiLM (Feature-wise Linear Modulation) mechanism. The network starts with two convolutional layers: the first layer has 32 filters with a kernel size of 3×33\times 3, and the second layer has 64 filters, also with a 3×33\times 3 kernel. Both convolutional layers employ a stride of 1 and are followed by a max-pooling layer that reduces each dimension by half.

The feature maps from the convolutional layers are flattened and passed through a series of fully connected (MLP) layers. Specifically, the first layer maps the features to 64 dimensions, and the subsequent layers maintain this size until the final layer, which outputs a scalar value. The activation function used in these layers is ReLU.

To condition the network on the λ\lambda parameter using FiLM, we design two multi-layer perceptrons (MLPs), MμM_{\mu} and MσM_{\sigma}. Both MLPs take the λ\lambda parameter as input and have 4 hidden layers. Each of these hidden layers is of size 256. These MLPs produce the modulation parameters μ\mu and σ\sigma, which are used to perform feature-wise linear modulation on the outputs of the main MLP layers. The final output of the network is passed through a sigmoid activation function to produce the model’s prediction. We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. Training this model takes roughly 1.5 hours on a Tesla-V100-SXM2-32GB GPU.

Jigsaw dataset

For the Jigsaw dataset, we employ a neural network model built upon the BERT architecture [13] integrated with the Feature-wise Linear Modulation (FiLM) mechanism. We utilize the representation corresponding to the [CLS] token, which carries aggregate information about the entire sequence. To condition the BERT’s output on the λ\lambda parameter using FiLM, we design two linear layers, which map the λ\lambda parameter to modulation parameters γ\gamma and β\beta, both of dimension equal to BERT’s hidden size of 768. These modulation parameters are then used to perform feature-wise linear modulation on the [CLS] representation. The modulated representation is passed through a classification head, which consists of a linear layer mapping from BERT’s hidden size (768) to a scalar output. In terms of training details, our model is trained for a maximum of 10 epochs, with early stopping based on validation losses. Training this model takes roughly 6 hours on a Tesla-V100-SXM2-32GB GPU.

F.2 Datasets

We used four real-world datasets for our experiments.

Adult dataset

The Adult income dataset [7] includes employment data for 48,842 individuals where the task is to predict whether an individual earns more than $50k per year and includes demographic attributes such as age, race and gender. In our experiments, we consider gender as the sensitive attribute.

COMPAS dataset

The COMPAS recidivism data comprises collected by ProPublica [5], includes information for 6172 defendants from Broward County, Florida. This information comprises 52 features including defendants’ criminal history and demographic attributes, and the task is to predict recidivism for defendants. The sensitive attribute in this dataset is the defendants’ race where A=1A=1 represents ‘African American’ and A=0A=0 corresponds to all other races.

CelebA dataset

The CelebA dataset [28] consists of 202,599 celebrity images annotated with 40 attribute labels. In our task, the objective is to predict whether an individual in the image is smiling. The dataset comprises features in the form of image pixels and additional attributes such as hairstyle, eyeglasses, and more. The sensitive attribute for our experiments is gender.

Jigsaw Toxicity Classification dataset

The Jigsaw Toxicity Classification dataset [20] contains online comments from various platforms, aimed at identifying and mitigating toxic behavior online. The task is to predict whether a given comment is toxic or not. The dataset includes features such as the text of the comment and certain metadata such as the gender or race to which each comment relates. In our experiments, the sensitive attribute is the gender to which the comment refers, and we only filter the comments which refer to exactly one of ‘male’ or ‘female’ gender. This leaves us with 107,106 distinct comments.

F.3 Baselines

The baselines considered in our experiments include:

  • Regularization based approaches [8]: These methods seek to minimise fairness loss using regularized losses as shown in Section 3. We consider different regularization terms fair(hθ)\mathcal{L}_{\textup{fair}}(h_{\theta}) as smooth relaxations of the fairness violation Φfair\Phi_{\textup{fair}} as proposed in the literature [8]. To make this concrete, when the fairness violation under consideration is DP, we consider

    fair(hθ)=𝔼[g(hθ(X))A=1]𝔼[g(hθ(X))A=0],\mathcal{L}_{\textup{fair}}(h_{\theta})=\mathbb{E}[g(h_{\theta}(X))\mid A=1]-\mathbb{E}[g(h_{\theta}(X))\mid A=0],

    with g(x)=xg(x)=x denoted as ‘linear’ in our results and g(x)=logσ(x)g(x)=\log{\sigma(x)} where σ(x)1/(1+ex)\sigma(x)\coloneqq 1/(1+e^{-x}), denoted as ‘logsig’ in our results. In addition to these methods, we also consider separately trained models with the same regularization term as the YOTO models, i.e.,

    DP:fair(hθ)=|𝔼[σ(hθ(X))A=1]𝔼[σ(hθ(X))A=0]|\displaystyle\textup{DP:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta})=|\mathbb{E}[\sigma(h_{\theta}(X))\mid A=1]-\mathbb{E}[\sigma(h_{\theta}(X))\mid A=0]|
    EOP:fair(hθ)=|𝔼[σ(hθ(X))A=1,Y=1]𝔼[σ(hθ(X))A=0,Y=1]|\displaystyle\textup{EOP:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta})=|\mathbb{E}[\sigma(h_{\theta}(X))\mid A=1,Y=1]-\mathbb{E}[\sigma(h_{\theta}(X))\mid A=0,Y=1]|
    EO:fair(hθ)=|𝔼[σ(hθ(X))A=1,Y=1]𝔼[σ(hθ(X))A=0,Y=1]|\displaystyle\textup{EO:}\quad\mathcal{L}_{\textup{fair}}(h_{\theta})=|\mathbb{E}[\sigma(h_{\theta}(X))\mid A=1,Y=1]-\mathbb{E}[\sigma(h_{\theta}(X))\mid A=0,Y=1]|
    +|𝔼[σ(hθ(X))A=1,Y=0]𝔼[σ(hθ(X))A=0,Y=0]|.\displaystyle\qquad+|\mathbb{E}[\sigma(h_{\theta}(X))\mid A=1,Y=0]-\mathbb{E}[\sigma(h_{\theta}(X))\mid A=0,Y=0]|.

    Here, σ(x)1/(1+ex)\sigma(x)\coloneqq 1/(1+e^{-x}) denotes the sigmoid function. We denote these models as ‘separate’ in our experimental results as they are the separately trained counterparts to the YOTO model. For each relaxation, we train models for a range of λ\lambda values uniformly chosen in [0,10][0,10] interval.

  • Reductions Approach [1]: This method transforms the fairness problem into a sequence of cost-sensitive classification problems. Like the regularization approaches this requires multiple models to be trained. Here, to try and reproduce the trade-off curves, we train the reductions approach with a range of different fairness constraints uniformly in [0,1][0,1].

  • KDE-fair [12]: This method employs a kernel density estimation trick to quantify fairness measures, capturing the degree of irrelevancy of prediction outputs to sensitive attributes. These quantified fairness measures are expressed as differentiable functions with respect to classifier model parameters, allowing the use of gradient descent to solve optimization problems that respect fairness constraints. We focus on binary classification and well-known definitions of group fairness: Demographic Parity (DP), Equalized Odds (EO) and Equalized Opportunity (EOP).

  • Randomized Threshold Optimizer (RTO) [2]: This scalable post-processing algorithm debiases trained models, including deep neural networks, and is proven to be near-optimal by bounding its excess Bayes risk. RTO optimizes the trade-off curve by applying adjustments to the trained model’s predictions to meet fairness constraints. In our experiments, we first train a standard classifier which is referred to as the base classifier. Next, we use the RTO algorithm to apply adjustments to the trained model’s prediction to meet fairness constraints. To reproduce the trade-off curves, we apply post-hoc adjustments to the base classifier for a range of fairness violation constraints.

  • Adversarial Approaches [45]: These methods utilize an adversarial training paradigm where an additional model, termed the adversary, is introduced during training. The primary objective of this adversary is to predict the sensitive attribute AA using the predictions hθ(X)h_{\theta}(X) generated by the main classifier hθh_{\theta}. The training process involves an adversarial game between the primary classifier and the adversary, striving to achieve equilibrium. This adversarial dynamic ensures that the primary classifier’s predictions are difficult to use for determining the sensitive attribute AA, thereby minimizing unfair biases associated with AA. Specifically, for DP constraints, the adversary takes the logit outputs of the classifier as the input and predicts AA. In contrast for EO and EOP constraints, the adversary also takes the true label YY as the input. For EOP constraint, the adversary is only trained on data with Y=1Y=1.

The training times for each baseline on the different datasets have been listed in Table 6.

Table 6: Approximate training times per model for different baselines across various datasets.
Adult COMPAS CelebA Jigsaw
adversary 3 min 3 min 90 min 310 min
linear/logsig/separate 4 min 3 min 80 min 310 min
KDE-fair 2 min 2 min 60 min 280 min
reductions 3 min 2 min 70 min 290 min
YOTO 4 min 3 min 90 min 320 min
RTO (base classifier training) 3 min 3 min 70 min 310 min
RTO (Post-hoc optimisation per fairness constraint) 1 min 1 min 10 min 30 min

F.4 Additional results

Refer to caption
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 15: Demographic Parity results for Adult dataset
Refer to caption
(d) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(e) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(f) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 16: Equalized Opportunity results for Adult dataset
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 17: Equalized Odds results for Adult dataset
Refer to caption
Refer to caption
(a) |𝒟cal|=500|\mathcal{D}_{\textup{cal}}|=500
Refer to caption
(b) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(c) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Figure 18: Demographic Parity results for COMPAS dataset
Refer to caption
(d) |𝒟cal|=500|\mathcal{D}_{\textup{cal}}|=500
Refer to caption
(e) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(f) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Figure 19: Equalized Opportunity results for COMPAS dataset
Refer to caption
(a) |𝒟cal|=500|\mathcal{D}_{\textup{cal}}|=500
Refer to caption
(b) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(c) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Figure 20: Equalized Odds results for COMPAS dataset
Refer to caption
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 21: Demographic Parity results for CelebA dataset
Refer to caption
(d) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(e) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(f) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 22: Equalized Opportunity results for CelebA dataset
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 23: Equalized Odds results for CelebA dataset
Refer to caption
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 24: Demographic Parity results for Jigsaw dataset
Refer to caption
(d) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(e) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(f) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 25: Equalized Opportunity results for Jigsaw dataset
Refer to caption
(a) |𝒟cal|=1000|\mathcal{D}_{\textup{cal}}|=1000
Refer to caption
(b) |𝒟cal|=5000|\mathcal{D}_{\textup{cal}}|=5000
Refer to caption
(c) |𝒟cal|=10,000|\mathcal{D}_{\textup{cal}}|=10,000
Figure 26: Equalized Odds results for Jigsaw dataset

In Figures 17-26 we include additional results for all datasets and fairness violations with an increasing number of calibration data 𝒟cal\mathcal{D}_{\textup{cal}}. It can be seen that as the number of calibration data increases, the CIs constructed become increasingly tighter. However, the asymptotic, Bernstein and bootstrap CIs are informative even when the calibration data is as little as 500 for COMPAS data (Figures 20-20) and 1000 for all other datasets. These results show that the larger the calibration data 𝒟cal\mathcal{D}_{\textup{cal}}, the tighter the constructed CIs are likely to be. However, even in cases where the calibration dataset is relatively small, we obtain informative CIs in most cases.

Tables 7 - 18 show the proportion of the baselines which lie in the three trade-off regions, ‘Unlikely’, ‘Permissible’ and ‘Sub-optimal’ for the Bernstein’s CIs, across the different datasets and fairness metrics. Overall, it can be seen that our CIs are reliable since the proportion of trade-offs which lie below our CIs is small. On the other hand, the tables show that our CIs can detect a significant number of sub-optimalities in our baselines, showing that our intervals are informative.

Table 7: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the COMPAS dataset and DP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.67 0.33
adversary 0.15 0.85 0.0
linear 0.0 0.82 0.18
logsig 0.05 0.2 0.75
reductions 0.0 0.83 0.17
separate 0.0 1.0 0.0
Table 8: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the COMPAS dataset and EOP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 0.99 0.01
RTO 0.0 0.96 0.04
adversary 0.0 1.0 0.0
linear 0.0 1.0 0.0
logsig 0.01 0.36 0.62
reductions 0.03 0.77 0.2
separate 0.0 1.0 0.0
Table 9: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the COMPAS dataset and EO (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.89 0.11
adversary 0.0 1.0 0.0
linear 0.0 0.76 0.24
logsig 0.0 0.45 0.55
reductions 0.0 0.7 0.3
separate 0.1 0.87 0.03
Table 10: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Adult dataset and DP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.33 0.67
adversary 0.0 0.0 1.0
linear 0.13 0.47 0.4
logsig 0.0 0.27 0.73
reductions 0.0 0.13 0.87
separate 0.03 0.95 0.02
Table 11: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Adult dataset and EOP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.07 0.89 0.04
RTO 0.0 0.5 0.5
adversary 0.0 0.36 0.64
linear 0.0 0.5 0.5
logsig 0.0 0.09 0.91
reductions 0.0 0.17 0.83
separate 0.02 0.95 0.03
Table 12: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Adult dataset and EO (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.4 0.6
adversary 0.0 0.36 0.64
linear 0.0 0.4 0.6
logsig 0.0 0.65 0.35
reductions 0.0 0.1 0.9
separate 0.0 1.0 0.0
Table 13: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the CelebA dataset and DP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.56 0.44
adversary 0.0 0.18 0.82
linear 0.0 0.88 0.12
logsig 0.05 0.95 0.0
separate 0.0 1.0 0.0
Table 14: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the CelebA dataset and EOP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 0.91 0.09
RTO 0.0 0.45 0.55
adversary 0.0 0.27 0.73
linear 0.15 0.82 0.03
logsig 0.0 0.73 0.27
separate 0.0 1.0 0.0
Table 15: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the CelebA dataset and EO (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 1.0 0.0
RTO 0.0 0.71 0.29
adversary 0.0 0.27 0.73
linear 0.0 0.97 0.03
logsig 0.0 0.73 0.27
separate 0.0 1.0 0.0
Table 16: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Jigsaw dataset and DP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 0.92 0.08
RTO 0.0 0.5 0.5
adversary 0.0 0.09 0.91
linear 0.0 0.76 0.24
logsig 0.0 0.4 0.6
separate 0.0 0.88 0.12
Table 17: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Jigsaw dataset and EOP (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 0.83 0.17
RTO 0.0 0.62 0.38
adversary 0.06 0.94 0.0
linear 0.14 0.59 0.27
logsig 0.0 0.29 0.71
separate 0.0 0.47 0.53
Table 18: Proportion of empirical trade-offs for each baseline which lie in the three trade-off regions, for the Jigsaw dataset and EO (using Bernstein’s CIs). Here 𝒟cal\mathcal{D}_{\textup{cal}} is a 10% dataset split.
Unlikely Plausible Sub-optimal
KDE-fair 0.0 0.75 0.25
RTO 0.0 0.57 0.43
adversary 0.0 1.0 0.0
linear 0.0 0.73 0.27
logsig 0.0 0.4 0.6
separate 0.0 0.87 0.13

F.5 Additional details for Figure 1

For Figure 1, we consider a calibration data 𝒟cal\mathcal{D}_{\textup{cal}} of size 2000. For the optimal model, we trained a YOTO model on the COMPAS dataset with the architecture given in F.1.1. We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. Once trained, we obtained the red and black curves by first splitting 𝒟cal\mathcal{D}_{\textup{cal}} randomly into two subsets, and evaluating the trade-offs on each split separately. On the other hand, for the sub-optimal model, we use another YOTO model with the same architecture but stop the training after 20 epochs. This results in the trained model achieving a sub-optimal trade-off as shown in the figure. We evaluated this trade-off curve on the entire calibration data.

F.6 Experimental results with FACT Frontiers

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 27: Results on the Adult dataset with FACT frontiers (pre-sensitivity analysis, i.e. Δ(hλ)=0\Delta(h_{\lambda})=0). Here, |𝒟cal|=2000|\mathcal{D}_{\text{cal}}|=2000 and α=0.05\alpha=0.05.
MA FACT and MS FACT yield conservative Pareto Frontiers.

In Figure 27 we include the results for the Adult dataset along with the model-specific (MS) and model-agnostic (MA) FACT Pareto frontiers, obtained using the methodology in [23]. It can be seen that both Pareto frontiers, MS FACT and MA FACT are overly conservative, with the MA FACT being completely non-informative on the Adult dataset. The model-agnostic FACT trade-off considers the best-case trade-off for any given dataset and does not depend on a specific model class. As a result, there is no guarantee that this Pareto frontier is achievable. This is evident from the fact that the MA FACT yields a non-informative Pareto frontier on the Adult dataset in Figure 27, which is non-achievable in practice.

The model-specific FACT trade-off curve is comparatively more realistic as it is derived using a pre-trained model. However, it is important to note that the trade-off for this pre-trained model may not lie on the obtained MS FACT frontier. In fact, MS FACT may still not be achievable by any model in the given model class, as it does not provide any guarantee of being achievable either. This can be seen from Figure 27 which shows that the MS-FACT frontier is not achieved by any of the SOTA baselines under consideration. In these experiments, we used a logistic regression classifier as a pre-trained model for obtaining the MS FACT frontier.

F.7 Synthetic data experiments

In real-world settings, the ground truth trade-off curve τfair\tau^{\ast}_{\textup{fair}} remains intractable because we only have access to a finite dataset. In this section, we consider a synthetic data setting, where the ground truth trade-off curve can be obtained, to verify that the YOTO trade-off curves are consistent with the ground truth and that the confidence intervals obtained using our methodology contain τfair\tau^{\ast}_{\textup{fair}}.

Dataset

Here, we consider a setup with 𝒳=\mathcal{X}=\mathbb{R}, 𝒜={0,1}\mathcal{A}=\{0,1\} and 𝒴={0,1}\mathcal{Y}=\{0,1\}. Specifically, ABern(0.5)A\sim\textup{Bern}(0.5) and we define the conditional distributions XA=aX\mid A=a as:

XA=a𝒩(a,0.22)X\mid A=a\sim\mathcal{N}(a,0.2^{2})

Moreover, we define the labels YY as follows:

Y=Z 1(X>0.5)+(1Z) 1(X0.5),\displaystyle Y=Z\,\mathbbm{1}(X>0.5)+(1-Z)\,\mathbbm{1}(X\leq 0.5),

where ZBern(0.9)Z\sim\textup{Bern}(0.9) and ZXZ\perp\!\!\!\perp X. Here, ZZ introduces some ‘noise’ to the labels YY and means that perfect accuracy is not achievable by linear classifiers. If perfect accuracy was achievable, the optimal values for Equalized Odds and Equalized Opportunity would be 0 (and would be achieved by the perfect classifier), therefore our use of ‘noisy’ labels YY ensures that the ground truth trade-off curves will be non-trivial.

YOTO model training Using the data generating mechanism, we generate 50005000 training datapoints, which we use to train the YOTO model. The YOTO model for this dataset comprises of a simple logistic regression as the main model, with only the scalar logit outputs of the logistic regression being conditioned using FiLM. The MLPs Mμ,MσM_{\mu},M_{\sigma} both have two hidden layers, each of size 4, and ReLU activations. We train the model for a maximum of 1000 epochs, with early stopping based on validation losses. Training these simple model only requires one CPU and takes roughly 2 minutes.

Ground truth trade-off curve To obtain the ground truth trade-off curve τfair\tau^{\ast}_{\textup{fair}}, we consider the family of classifiers

hc(X)=𝟙(X>c)\displaystyle h_{c}(X)=\mathbbm{1}(X>c)

for cc\in\mathbb{R}. Next, we calculate the trade-offs achieved by this model family for a fine grid of cc values between -3 and 3, using a dataset of size 500,000 obtained using the data-generating mechanism described above. The large dataset size ensures that the finite sample errors in accuracy and fairness violation values are negligible. This allows us to reliably plot the trade-off curve τfair\tau^{\ast}_{\textup{fair}}.

F.7.1 Results

Figure 28 shows the results for the synthetic data setup for three different fairness violations, obtained using a calibration dataset 𝒟cal\mathcal{D}_{\textup{cal}} of size 2000. It can be seen that for each fairness violation considered, the YOTO trade-off curve aligns very well with the ground-truth trade-off curve τfair\tau^{\ast}_{\textup{fair}}. Additionally, we also consider four different confidence intervals obtained using our methodology, and Figure 28 shows that each of the four confidence intervals considered contain the ground-truth trade-off curve. This empirically verifies the validity of our confidence intervals in this synthetic setting.

Additionally, in Figure 29 we plot how the worst-case values for Δ(h)\Delta(h) change (relative to the optimal trade-off τfair\tau^{\ast}_{\textup{fair}}) as the number of training data 𝒟tr\mathcal{D}_{\textup{tr}} increases. Here, we use hλh_{\lambda} as a short-hand notation for the YOTO model hθ(,λ)h_{\theta}(\cdot,\lambda) and the quantity on the yy-axis, maxλΛΔ(hλ)τfair(acc(hλ),\max_{\lambda\in\Lambda}\frac{\Delta(h_{\lambda})}{\tau^{\ast}_{\textup{fair}}(\textup{acc}(h_{\lambda})}, can be intuitively thought of as the worst-case error percentage between YOTO trade-off and the optimal trade-off. The figure empirically verifies our result in Theorem 3.4 by showing that as the number of training data increases, the error term declines across all fairness metrics.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 28: Results for the synthetic dataset with ground truth trade-off curves τfair\tau^{\ast}_{\textup{fair}}.
Refer to caption
Figure 29: Plot showing how Δ(h)\Delta(h) decreases (relative to the ground truth trade-off value τfair(acc(h))\tau^{\ast}_{\textup{fair}}(\textup{acc}(h))) as the training data size |𝒟tr||\mathcal{D}_{\textup{tr}}| increases. Here, we plot the worst (i.e. largest) value of Δ(hλ)/τfair(acc(hλ))\Delta(h_{\lambda})/\tau^{\ast}_{\textup{fair}}(\textup{acc}(h_{\lambda})) achieved by our YOTO model over a grid of λ\lambda values in [0,5][0,5].