This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

[columns=3, title=Alphabetical Index, intoc]

DiRe Committee : Diversity and Representation Constraints in Multiwinner Elections

Kunal Relia
New York University, USA
krelia@nyu.edu
This work was supported in part by Julia Stoyanovich’s NSF Grants No. 1916647 and 1934464.
Abstract

The study of fairness in multiwinner elections focuses on settings where candidates have attributes. However, voters may also be divided into predefined populations under one or more attributes (e.g., “California” and “Illinois” populations under the “state” attribute), which may be same or different from candidate attributes. The models that focus on candidate attributes alone may systematically under-represent smaller voter populations. Hence, we develop a model, DiRe Committee Winner Determination (DRCWD), which delineates candidate and voter attributes to select a committee by specifying diversity and representation constraints and a voting rule. We analyze its computational complexity, inapproximability, and parameterized complexity. We develop a heuristic-based algorithm, which finds the winning DiRe committee in under two minutes on 63% of the instances of synthetic datasets and on 100% of instances of real-world datasets. We present an empirical analysis of the running time, feasibility, and utility traded-off.

Overall, DRCWD motivates that a study of multiwinner elections should consider both its actors, namely candidates and voters, as candidate-specific models can unknowingly harm voter populations, and vice versa. Additionally, even when the attributes of candidates and voters coincide, it is important to treat them separately as diversity does not imply representation and vice versa. This is to say that having a female candidate on the committee, for example, is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female.

Keywords Fairness  \cdot Multiwinner Elections  \cdot Computational Social Choice

1 Introduction

The problem of selecting a committee from a given set of candidates arises in multiple domains; it ranges from political sciences (e.g., selecting the parliament of a country) to recommendation systems (e.g., selecting the movies to show on Netflix). Formally, given a set CC of mm candidates (politicians and movies, respectively), a set VV of nn voters (citizens and Netflix subscribers, respectively) give their ordered preferences over all candidates cCc\in C to select a committee of size kk. These preferences can be stated directly in case of parliamentary elections, or they can be derived based on input, such as when Netflix subscribers’ viewing behavior is used to derive their preferences. In this paper, we focus on selecting a kk-sized (fixed size) committee using direct, ordered, and complete preferences.

Which committee is selected depends on the committee selection rule, also called multiwinner voting rule. Examples of commonly used families of rules when a complete ballot of each voter is given are Condorcet principle-based rules [1], which select a committee that is at least as strong as every other committee in a pairwise majority comparison, approval-based voting rules [1, 2, 3] where each voter submits an approval ballot approving a subset of candidates, and ordinal preference ballot-based voting rules like k-Borda and β\beta-Chamberlin-Courant (β\beta-CC) [4, 5] that are analogous to single-winner rules. We note that this version of CC rule is different from the Chamberlin–Courant approval voting rule used in the context of approval elections [6, 7]. We refer readers to Section 2.2 of [5] for further details on the commonly used families of multiwinner voting rules. In this paper, we focus on ordinal preference-based rules that are analogous to single-winner rules.

Recent work on fairness in multiwinner elections show that these rules can create or propagate biases by systematically harming candidates coming from historically disadvantaged groups [8, 9]. Hence, diversity constraints on candidate attributes were introduced to overcome this problem. However, voters may be divided into predefined populations under one or more attributes, which may be different from candidate attributes. For example, voters in Figure 1(b) are divided into “California” and “Illinois” populations under the “state” attribute. The models that focus on candidate attributes alone may systematically under-represent smaller voter populations.

Refer to caption
(a) candidates
Refer to caption
(b) voters
Figure 1: (a) Candidates with “gender” attribute (and their Borda scores) and (b) voters with “state” attribute. The winning committee (size kk=2) for California and Illinois, states in the United States, is {c1,c2}c_{1},c_{2}\} and {c4,c2}c_{4},c_{2}\}, respectively.
Example 1.

Consider an election EE consisting of mm = 4 candidates (Figure 1(a)) and nn = 4 voters giving ordered preference over mm candidates (Figure 1(b)) to select a committee of size kk = 2. Candidates and voters have one attribute each, namely gender and state, respectively. The kk-Borda111The Borda rule associates score mim-i with the ithi^{\text{th}} position, and kk-Borda selects kk candidates with the highest Borda score. winning committee computed for each voter population is {c1\{c_{1}, c2}c_{2}\} for California and {c4\{c_{4}, c2}c_{2}\} for Illinois.

Suppose that we impose a diversity constraint that requires the committee to have at least one candidate of each gender, and a representation constraint that requires the committee to have at least one candidate from the winning committee of each state. Observe that the highest-scoring committee, which is also representative, consists of {c1\{c_{1}, c2}c_{2}\} (score = 17), but this committee is not diverse, since both candidates are male. Further, the highest-scoring diverse committee consisting of {c1\{c_{1}, c3}c_{3}\} (score = 13) is not representative because it does not include any winning candidates from Illinois, the smaller state. The highest-scoring diverse and representative committee is {c2\{c_{2}, c3}c_{3}\} (score = 12).

This example illustrates the inevitable utility cost due to enforcing of additional constraints.

Note that, in contrast to prior work in computational social choice, we incorporate voter attributes that are separate from candidate attributes. Also, our work is different from the notion of “proportional representation” [3, 10, 11], where the number of candidates selected in the committee from each group is proportional to the number of voters preferring that group, and from its variants such as “fair” representation [12]. All these approaches dynamically divide the voters based on the cohesiveness of their preferences. Another related work, multi-attribute proportional representation [13], couples candidate and voter attributes. An important observation we make here is that, even if the attributes of the candidates and of the voters coincide, it may still be important to treat them separately in committee selection. This is because having a female candidate on the committee, for example, is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female.

Contributions. In this paper, we define a model that treats candidate and voter attributes separately during committee selection, and thus enables selection of the highest-scoring diverse and representative committee. We show NP-hardness of committee selection under our model for various settings, give results on inapproximability and parameterized complexity, and present a heuristic-based algorithm. Finally, we present an experimental evaluation using real and synthetic datasets, in which we show the efficiency of our algorithm, analyze the feasibility of committee selection and illustrate the utility trade-offs.

2 Related Work

Our work is at an intersection of multiple ideas, and hence, in this section, we briefly summarize the related work spread across different domains, some of which we already discussed in the previous section.

Fairness in Ranking and Set Selection.

There is a growing understanding in the field of theoretical computer science about the possible presence of algorithmic bias in multiple domains [14, 15, 16, 17, 18, 19], especially in variants of set selection problem [20]. The study of fairness in ranking and set selection, closely related to the study of multiwinner elections, use constraints in algorithms to mitigate bias caused against historically disadvantaged groups. Stoyanovich et al. [20] use constraints in streaming set selection problem, and Yang and Stoyanovich [21] and Yang et al. [22] use constraints for ranked outputs. Kuhlman and Rundensteiner [23] focus on fair rank aggregation and Bei et al. [24] use proportional fairness constraints. Our work adds to the research on the use of constraints to mitigate algorithmic bias.

Fairness in Participatory Budgeting.

Multiwinner elections are a special case of participatory budgeting, and fairness in the latter domain has also received particular attention. For example, projects (equivalent to candidates) are divided into groups and for fairness they consider lower and upper bounds on utility achieved and the lower and upper bounds on cost of projects used in every group [25]. Fluschnik et al. [26] aim to achieve fairness among projects using their objective function. Next, Hershkowitz et al. [27] have studied fairness from the utility received by the districts (equivalent to voters), Peters et al. [28] define axioms for proportional representation of voters, and Lackner et al. [29] define fairness in long-term participatory budgeting from voters’ perspective. However, we note that none of these work simultaneously consider fairness from the perspective of both, the projects and the districts.

Two-sided Fairness.

The need for fairness from the perspective of different stakeholders of a system is well-studied. For instance, Patro et al. [30], Chakraborty et al. [31], and Suhr et al. [32] consider two-sided fairness in two-sided platforms222A two-sided platform is an intermediary economic platform having two distinct user groups that provide each other with network benefits such that the decisions of each set of user group affects the outcomes of the other set [33]. For example, credit cards market consists of cardholders and merchants and health maintenance organizations consists of patients and doctors. and Abdollahpouriet al. [34] and Burke et al. [35] shared desirable fairness properties for different categories of multi-sided platforms. However, this line of work focuses on multi-sided fairness in multi-sided platforms, which is technically different from an election. An election, roughly speaking, can be considered a “one-sided platform” consisting of more than one stakeholders as during an election, candidates do not make decisions that affect the voters. Hence, δ\delta-sided fairness in one-sided platform is also needed where δ\delta is the number of distinct user-groups on the platform. More generally, δ\delta-sided fairness in η\eta-sided platform warrants an analysis of δη\delta\cdot\eta perspectives of fairness, i.e., the effect of fairness on each of the δ\delta stakeholders for each of the η\eta fairness metrics being used. In elections, δ=2\delta=2 (candidates and voters) and η=1\eta=1 (voting). Additionally, Aziz [36] summarized a line of work related to diversity concerns in two-sided matching that focused on diversity with respect to one stakeholder only.

Unconstrained Multiwinner Elections and Proportional Representation.

The study of complexity of unconstrained multiwinner elections has received attention [5]. Selecting a committee using Chamberlin-Courant (CC) [37] rule is NP-hard [38], and approximation algorithms have resulted in the best known ratio of 11e1-\frac{1}{e} [39, 40]. Yang and Wang [41] studied its parameterized complexity. Another commonly studied rule, Monroe [11], is also NP-hard [42, 4]. Sonar et al. [43] showed that even checking whether a given committee is optimal when using these two rules is hard. Finally, the hardness of problems involving restricted voter preferences and committee selection rule have been studied [44, 45] and so has the proportional representation in dynamic ranking [46].

Constrained Multiwinner Elections.

The study of complexity of using diversity constraints in elections and its complexity has also received particular attention. Goalbase score functions, which specify an arbitrary set of logic constraints and let the score capture the number of constraints satisfied, could be used to ensure diversity [47]. Using diversity constraints over multiple attributes in single-winner elections is NP-hard [13]. Also, using diversity constraints over multiple attributes in multiwinner elections is NP-hard, which has led to approximation algorithms and matching hardness of approximation results by Bredereck et al. [8] and Celis et al. [9]. Finally, due to the hardness of using diversity constraints over multiple attributes in approval-based multiwinner elections [48], these have been formalized as integer linear programs (ILP) [49]. In contrast, Skowron et al. [39] showed that ILP-based algorithms fail in real world when using ranked voting-related proportional representation rules like Chamberlin-Courant and Monroe rules, even when there are no constraints.

Overall, the work by Bredereck et al. [8], Celis et al. [9], and Lang and Skowron [13] is closest to ours. However, we differ as we: (i) consider elections with predefined voter populations under one or more attributes, (ii) delineate voter and candidate attributes even when they coincide, and (iii) consider representation and diversity constraints. No previous work, to the best of our knowledge, has considered fairness from the perspective of voter attributes or has delineated candidate and voter attributes even when they coincide.

3 Preliminaries and Notation

Multiwinner Elections.

Let E=(C,V)E=(C,V) be an election consisting of a candidate set C={c1,,cm}C=\{c_{1},\dots,c_{m}\} and a voter set V={v1,,vn}V=\{v_{1},\dots,v_{n}\}, where each voter vVv\in V has a preference list v\succ_{v} over mm candidates, ranking all of the candidates from the most to the least desired. posv(c)\operatorname{pos}_{v}(c) denotes the position of candidate cCc\in C in the ranking of voter vVv\in V, where the most preferred candidate has position 1 and the least preferred has position mm.

Given an election E=(C,V)E=(C,V) and a positive integer k[m]k\in[m] (for kk\in\mathbb{N}, we write [k]={1,,k}[k]=\{1,\dots,k\}), a multiwinner election selects a kk-sized subset of candidates (or a committee) WW using a multiwinner voting rule 𝚏\mathtt{f} (discussed later) such that the score of the committee 𝚏(W)\mathtt{f}(W) is the highest. Formally, given E=(C,V)E=(C,V) and kk, 𝚏\mathtt{f} outputs the required committee WW of exactly kk candidates with the highest score. We assume ties are broken using a pre-decided priority order over all candidates.

Candidate Groups.

The candidates have μ\mu attributes, A1,,AμA_{1},...,A_{\mu}, such that μ\mu\in\mathbb{Z} and μ0\mu\geq 0. Each attribute AiA_{i}, for all i[μ]i\in[\mu], partitions the candidates into gi[m]g_{i}\in[m] groups, A(i,1),,A(i,gi)CA_{(i,1)},...,A_{(i,g_{i})}\subseteq C. Formally, A(i,j)A(i,j)=A_{(i,j)}\cap A_{(i,j^{\prime})}=\emptyset, j,j[gi],jj\forall j,j^{\prime}\in[g_{i}],j\neq j^{\prime}. For example, candidates in Figure 1(a) have one attribute gender (μ\mu = 1) with two disjoint groups, male and female (g1g_{1} = 2). Overall, the set 𝒢\mathcal{G} of all such arbitrary and potentially non-disjoint groups will be A(1,1),,A(μ,gμ)CA_{(1,1)},...,A_{(\mu,g_{\mu})}\subseteq C. Note that the number of groups a candidate belongs to is equal to the number of attributes μ\mu.

Voter Populations.

The voters have π\pi attributes, A1,,AπA^{\prime}_{1},...,A^{\prime}_{\pi}, such that π\pi\in\mathbb{Z} and π0\pi\geq 0. The voter attributes may be different from the candidate attributes. Each attribute AiA^{\prime}_{i}, for all i[π]i\in[\pi], partitions the voters into pi[n]p_{i}\in[n] populations, P(i,1),,P(i,pi)VP_{(i,1)},...,P_{(i,p_{i})}\subseteq V. Formally, P(i,j)P(i,j)=P_{(i,j)}\cap P_{(i,j^{\prime})}=\emptyset, j,j[pi],jj\forall j,j^{\prime}\in[p_{i}],j\neq j^{\prime}. For example, voters in Figure 1(b) have one attribute state (π\pi = 1), which has two populations California and Illinois (p1p_{1} = 2). Overall, the set 𝒫\mathcal{P} of all such predefined and potentially non-disjoint populations will be P(1,1),,P(π,pπ)VP_{(1,1)},...,P_{(\pi,p_{\pi})}\subseteq V.

The number of populations a voter belongs to is equal to the number of attributes π\pi. Additionally, we are given WPW_{P}, the winning committee of each population P𝒫P\in\mathcal{P}. We note that a fine-grained accounting of representation is not possible in our model. This is because when a committee selection rule such as Chamberlin-Courant rule is used to determine each population’s winning committee WPW_{P}, then a complete-ranking of each population’s collective preferences is not possible. Thus, we have design our model to only consider each population’s winning committee WPW_{P}.

Multiwinner Voting Rules.

There are multiple types of multiwinner voting rules, also called committee selection rules. In this paper, we focus on committee selection rules 𝚏\mathtt{f} that are based on single-winner positional voting rules, and are monotone and submodular (AB,f(A)f(B)\forall A\subseteq B,f(A)\leq f(B) and f(B)f(A)+f(BA)f(B)\leq f(A)+f(B\setminus A)) [8, 9].

Definition 1.

Chamberlin–Courant (CC) rule: The CC rule [37] associates each voter with a candidate in the committee who is their most preferred candidate in that committee. The score of a committee is the sum of scores given by voters to their associated candidate. Specifically, β\mathbf{\beta}-CC uses Borda positional voting rule such that it assigns a score of mim-i to the ithi^{\text{th}} ranked candidate who is their highest ranked candidate in the committee.

Definition 2.

Monroe rule: The Monroe rule [11] dynamically divides the nn voters into π\pi populations based on the cohesiveness of their preferences where π\pi = kk (assuming kk divides nn). Then, each subpopulation’s most preferred candidate is selected into the kk-sized committee. Formally, for each population, say P𝒫P\in\mathcal{P}, select the candidate cc that has the highest score for that subpopulation: maxcC(𝚏P(c))\max_{c\in C}(\mathtt{f}_{P}(c)). In other words, each candidate in the committee is represented by an equal number of voters.

A special case of submodular functions are separable functions, which calculate the score of committee as follows: score of a committee WW is the sum of the scores of individual candidates in the committee. Formally, 𝚏\mathtt{f} is separable if it is submodular and 𝚏(W)=cW𝚏(c)\mathtt{f}(W)=\sum_{c\in W}\mathtt{f}(c) [8]. Monotone and separable selection rules are natural and are considered good when the goal of an election is to shortlist a set of individually excellent candidates [5]:

Definition 3.

kk-Borda rule The kk-Borda rule outputs committees of kk candidates with the highest Borda scores.

4 DiRe Committee Model

In this section, we formally define a model to select a diverse and representative committee, namely DiReDiRe committee, and show its generality.

Definition 4.

Unconstrained Committee Winner Determination (UCWD): We are given a set CC of mm candidates, a set VV of nn voters such that each voter vVv\in V has a preference list v\succ_{v} over mm candidates, a committee selection rule 𝚏\mathtt{f}, and a committee size k[m]k\in[m]. Let 𝒲\mathcal{W} denote the family of all size-kk committees. The goal of UCWD is to select a committee W𝒲W\in\mathcal{W} that maximizes 𝚏(W)\mathtt{f}(W).

We now discuss the diversity and representation constraints. The lowest possible value that these constraints can take is 1, which replicates real-world scenarios. For instance, the United Nations charter guarantees at least one representative to each member country in the United Nations General Assembly, independent of the country’s population. Similarly, each state of the United States of America is guaranteed at least three representatives in the US House of Representatives. Hence, from fairness perspective, each candidate group and voter population deserves at least one candidate in the committee. Theoretically, all results in this paper hold even if the lowest possible value that the constraints can take is 0.

Diversity Constraints,

denoted by lGD[1,l^{D}_{G}\in[1, min(k,|G|)]\min(k,|G|)] for each candidate group G𝒢G\in\mathcal{G}, enforces at least lGDl^{D}_{G} candidates from the group GG to be in the committee WW. Formally, for all G𝒢G\in\mathcal{G}, |GW|lGD|G\cap W|\geq l^{D}_{G}. We note that we do not propose to use the upper bounds as it induces quota system, which is not desirable from social choice perspective.

Representation Constraints,

denoted by lPR[1,k]l^{R}_{P}\in[1,k] for each voter population P𝒫P\in\mathcal{P}, enforces at least lPRl^{R}_{P} candidates from the population PP’s committee WPW_{P} to be in the committee WW. Formally, for all P𝒫P\in\mathcal{P}, |WPW|lPR|W_{P}\cap W|\geq l^{R}_{P}. We again do not propose to use the upper bounds as it induces the undesirable quota system.

Definition 5.

(μ,π\mu,\pi)-DiRe Committee Feasibility ((μ\mu, π\pi)-DRCF): We are given an instance of election E=(C,V)E=(C,V), a committee size k[m]k\in[m], a set of candidate groups 𝒢\mathcal{G} over μ\mu attributes and their diversity constraints lGDl^{D}_{G} for all G𝒢G\in\mathcal{G}, and a set of voter populations 𝒫\mathcal{P} over π\pi attributes and their representation constraints lPRl^{R}_{P} and the winning committees WPW_{P} for all P𝒫P\in\mathcal{P}. Let 𝒲\mathcal{W} denote the family of all size-kk committees. The goal of (μ\mu, π\pi)-DRCF is to select committees W𝒲W\in\mathcal{W} that satisfy the diversity and representation constraints such that |GW|lGD|G\cap W|\geq l^{D}_{G} for all G𝒢G\in\mathcal{G} and |WPW|lPR|W_{P}\cap W|\geq l^{R}_{P} for all P𝒫P\in\mathcal{P}. All such committees that satisfy the constraints are called DiRe committees.

If a committee selection rule 𝚏\mathtt{f} is also an input to the feasibility problem, we get the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem:

Definition 6.

(μ,π\mu,\pi, 𝚏\mathtt{f})-DiRe Committee Winner Determination ((μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD): Given an instance of (μ\mu, π\pi)-DRCF and a committee selection rule 𝚏\mathtt{f}, let 𝒲\mathcal{W} denote the family of all size-kk committees, then the goal of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is to select a committee W𝒲W\in\mathcal{W} that maximizes the 𝚏(W)\mathtt{f}(W) among all DiReDiRe committees.

We note that we denote the possible values that μ\mu and π\pi can take using parenthesis. For example, ‘(2\leq 2, 0, 𝚏\mathtt{f})-DRCWD’ implies that we are specifying a setting μ:0μ2\forall\mu\in\mathbb{Z}:0\leq\mu\leq 2. We use the same notation for ‘\geq’ such that ‘(3\geq 3, 0, 𝚏\mathtt{f})-DRCWD’ implies that we are specifying a setting μ:μ3\forall\mu\in\mathbb{Z}:\mu\geq 3 . We use the same notation for π\pi.

Observation 1.

(μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a generalized version of (μ\mu, π\pi)-DRCF and UCWD. Hence, if (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is polynomial time computable, then so are the corresponding UCWD and (μ\mu, π\pi)-DRCF problems. If either UCWD is NP-hard or (μ\mu, π\pi)-DRCF is NP-hard, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard.

4.1 (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD and Related Models

Our model provides the flexibility to specify the diversity and representation constraints and to select the voting rule. Thus, in this section we define the diverse committee problem [8, 9] and the apportionment problem [10, 50] as special cases of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD.

(μ\mu, 0, 𝚏\mathtt{f})-DRCWD and Diverse Committee Problem.

We define the diverse committee problem in our model [8, 9]: In the diverse committee problem, we are given an instance of UCWD that consists of a set of candidate groups 𝒢\mathcal{G} and the corresponding diversity constraints, lower bound lGDl^{D}_{G} and upper bound uGDu^{D}_{G}, for all G𝒢G\in\mathcal{G}. Let 𝒲\mathcal{W} denote the family of all size-kk committees. The goal of the diverse committee problem is to select a committee W𝒲W\in\mathcal{W} that maximizes the 𝚏(W)\mathtt{f}(W) among the committees that satisfy the constraints.

It is clear that (μ\mu, 0, 𝚏\mathtt{f})-DRCWD, i.e., without the presence of any voter attributes, is equivalent to the diverse committee problem. As we do not use upper bounds, our model is generalizable when the upper bound uGDu^{D}_{G} in the diverse committee model is equal to the size of group GG for all G𝒢G\in\mathcal{G} and the minimum value that the lower bound can take is 1 for all G𝒢G\in\mathcal{G}. This is in line with the approach used in Theorem 6 of Celis et al. [9]. Formally, uGD=|G|u^{D}_{G}=|G| and lGD1l^{D}_{G}\geq 1 for all G𝒢G\in\mathcal{G}.

(0, 1, 𝚏\mathtt{f})-DRCWD and Apportionment Problem.

We define the apportionment problem in our model [10]: In the apportionment problem, we are given an instance of UCWD that consists of a set of disjoint voter populations 𝒫\mathcal{P} over one attribute and winning committees WPW_{P} for all P𝒫P\in\mathcal{P}. Let 𝒲\mathcal{W} denote the family of all size-kk committees. The goal of the apportionment problem is to select a committee W𝒲W\in\mathcal{W} that maximizes the 𝚏(W)\mathtt{f}(W) among all the committees that satisfy the lower quota, i.e., P𝒫\forall P\in\mathcal{P}, |WPW||P|nk|W_{P}\cap W|\geq\frac{|P|}{n}\cdot k.

It is easy to see that (0, 1, 𝚏\mathtt{f})-DRCWD, which consists of zero candidate attributes and one voter attribute, is same as the apportionment problem if we set the representation constraint of each population to be equal to the lower quota of the apportionment problem. Formally, P𝒫\forall P\in\mathcal{P}, lPRl^{R}_{P} = |P|nk\left\lfloor\frac{|P|}{n}\cdot k\right\rfloor, realistically assuming that P𝒫,|P|n1k\forall P\in\mathcal{P},\frac{|P|}{n}\geq\frac{1}{k}.

Finally, we note that our model can be adopted to accept approval votes as an input and thus if each population is completely cohesive within itself, then the representation constraints can be set to formulate known representation methods like proportional justified representation [3] and extended justified representation [6] as (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. Though we note that such reformulations may not be as straightforward as the discussed reformulations.

5 Complexity Results

Instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD Computational Complexity
(2\leq 2, 0, separable)-DRCWD P (Lem. 1)
(3\geq 3, 0, separable)-DRCWD NP-hard (Thm. 3, Thm. 4)
(0\geq 0, 1\geq 1, separable)-DRCWD NP-hard (Thm. 5, Cor. 2)
(0\geq 0, 0\geq 0, submodular)-DRCWD NP-hard (Thm. 6, Cor. 3)
Table 1: A summary of complexity of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD (Theorem 1, Corollary 1). The value in brackets for μ\mu and π\pi denote that the results hold for all non-negative integers μ\mu and all non-negative integers π\pi that satisfy the condition stated in the brackets. The results are under the assumption P \neq NP. ‘Lem.’ denotes Lemma. ‘Thm.’ denotes Theorem. ‘Cor.’ denotes Corollary.

In this section, we give a classification of the computational complexity333The hardness, inapproximability, and parameterized complexity results throughout the paper are under the assumption P \neq NP. of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem under different settings. Finding a committee using a submodular scoring function like the utilitarian version of Chamberlin-Courant rule is known to be NP-hard [38] and selecting a diverse committee when a candidate belongs to three groups is also known to be NP-hard [8, 9]. However, the proofs of these hardness results are fragmented over several papers and the proofs use reductions from several well-known NP-hard problems. For instance, the proof of hardness for the use of Chamberlin-Courant uses a reduction from exact 3-cover [38] and the proof of hardness for computing a diverse committee uses a reduction from 3-dimensional matching [8] and 3-hypergraph matching [9]. Moreover, we are the first ones to introduce representation constraints and hardness due to its use is unknown. Hence, in this section, we provide a complete classification of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem by giving a reduction from a single well-known NP-hard problem, namely, the vertex cover problem, inspired by the similar approach used in [51].

Finally, we note that as the following classification holds for every integer μ0\mu\geq 0 (specifically, every whole number as μ\mu can not be negative) and every integer π0\pi\geq 0, our reductions are designed for the same range of values.

Theorem 1.

Let μ,\mu, π\pi\in\mathbb{Z} : μ,\mu, π0\pi\geq 0 and 𝚏\mathtt{f} be a committee selection rule, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard.

Corollary 1.

Classification of Complexity of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD.

  1. 1.

    If μ:μ0\forall\mu\in\mathbb{Z}:\mu\geq 0, π:π0\forall\pi\in\mathbb{Z}:\pi\geq 0, and 𝚏\mathtt{f} is a monotone, submodular function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard.

  2. 2.

    If μ[0\forall\mu\in[0, 2]2], π=0\pi=0, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is in P.

  3. 3.

    If μ:μ3\forall\mu\in\mathbb{Z}:\mu\geq 3, π=0\pi=0, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard.

  4. 4.

    If μ:μ0\forall\mu\in\mathbb{Z}:\mu\geq 0, π:π1\forall\pi\in\mathbb{Z}:\pi\geq 1, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard.

5.1 Tractable Case

Theorem 2.

[Theorem 21, Corollary 22 in full-version of Celis et al. [9]] The diverse committee feasibility problem can be solved in polynomial time when μ\mu = 2.

Without loss of generality (W.l.o.g.), the above theorem holds when it is assumed that μ=2\mu=2. Hence, it holds for all μ\mu\in\mathbb{Z} : 0μ0\leq\mu\leq 2. Therefore, based on the relationship between (μ\mu, 0, 𝚏\mathtt{f})-DRCWD and Diverse Committee Problem (Section 4.1), we prove the following lemma, which in turn proves the statement in Corollary 1(2):

Lemma 1.

If μ[0\mu\in[0, 2]2], π=0\pi=0, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is in P.

Proof.

When π\pi=0, there are no voter attributes or representation constraints, and hence, the (μ\mu, 0, 𝚏\mathtt{f})-DRCWD problem is equivalent to the diverse committee problem. Moreover, when 𝚏\mathtt{f} is a monotone, separable function, then the complexity of the (μ\mu, 0, 𝚏\mathtt{f})-DRCWD is equivalent to the complexity of (μ\mu, 0)-DRCF. Thus, the polynomial time result of diverse committee feasibility problem when the number of groups a candidate belongs to is equal to two, which in our model implies that the number of candidate attributes is equal to two (μ=2\mu=2), holds for our setting (Theorem 9 [8], Corollary 22 (full-version) [9]).

More specifically, when μ=2\mu=2, we use the algorithm given in the proof of Theorem 21 by Celis et al. [9] and set the upper bound equal to the group size. Formally, uGD=|G|u_{G}^{D}=|G| for all G𝒢G\in\mathcal{G}.

Next, when μ=1\mu=1, a straight-forward algorithm that selects the top lGDl_{G}^{D} scoring candidates for all G𝒢G\in\mathcal{G} results into a DiRe committee, which satisfies the diversity constraints |GW|lGD|G\cap W|\geq l_{G}^{D}. ∎

5.2 Hardness Results

NP-hard problem used.

As discussed earlier, the NP-hardness of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD when using representation constraints is unknown. Moreover, the known hardness results for using submodular but not separable scoring function and diverse committee selection problems were established via reductions from different NP-hard problems. We will establish the NP-hardness of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD for various settings of μ\mu, π\pi, and 𝚏\mathtt{f} via reductions from a single well known NP-hard problem, namely, the vertex cover problem on 3-regular444A 3-regular graph stipulates that each vertex is connected to exactly three other vertices, each one with an edge, i.e., each vertex has a degree of 3. The VC problem on 3-regular graphs is NP-hard. We use 3-regular graphs to exploit its structure to prove the hardness of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD with respect to (w.r.t.) diversity constraints (Theorem 3 and Theorem 4). We note that the reductions used in the proofs of Theorem 5 and Theorem 6 do not need 3-regular graphs and hold for VC problem on arbitrary graphs as well., 2-uniform555The size of hyperedges has implications in the hardness of approximation and parameterized complexity results and hence, we mention it over here. For the complexity results, we use 2-uniform hypergraphs only. hypergraphs [52, 53].

Definition 7.

Vertex Cover (VC) problem: Given a graph HH consisting of a set of mm vertices XX = {x1,x2,,xm}\{x_{1},x_{2},\dots,x_{m}\} and a set of nn edges EE = {e1,e2,,en}\{e_{1},e_{2},\dots,e_{n}\} where each eEe\in E connects two vertices in XX such that ee corresponds to a 2-element subset of XX, then a vertex cover of HH is a subset SS of vertices such that each ee contains at least one vertex from SS (i.e. \forall eEe\in E, eSϕe\cap S\neq\phi). The vertex cover problem is to find the minimum vertex cover of HH.

5.2.1 (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD w.r.t. diversity constraints

When π=0\pi=0, (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is related to the diverse committee selection problem. However, the hardness of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD when μ3\mu\geq 3 does not follow the hardness of the diverse committee selection problem when the number of groups that a candidate can belong to is greater than or equal 3 [8, 9] as the reductions in these papers are specifically for the case when μ=3\mu=3.

More specifically, Theorem 9 of Bredereck et al. [8] uses a reduction from 3-Dimensional Matching that only holds for instances when the number of groups that a candidate can belong to is exactly 3. Also, they set lower bound and upper bound to 1, which is mathematically different from our setting where we only allow lower bounds. On the other hand, Theorem 6 (“NP-hardness of feasibility: Δ\Delta \geq 3”666In Celis et al. [9], Δ\Delta denotes “the maximum number of groups in which any candidate can be”.) of Celis et al. [9] uses two reductions: the first reduction from Δ\Delta-hypergraph matching is indeed for the case when the number of groups that a candidate can belong to is greater than or equal to 3 but is limited to instances when lower bound is set to 0 and upper bound to 1, which is a trivial case in our setting as we only use lower bounds and do not allow for upper bounds. Moreover, in-principle, the reduction from Δ\Delta-hypergraph matching uses a different problem for each Δ\Delta as when ΔΔ\Delta\neq\Delta^{\prime}, the Δ\Delta-hypergraph matching and Δ\Delta^{\prime}-hypergraph matching are separate problems. The second reduction from 3-regular vertex cover is for instances when the number of groups that a candidate can belong to is exactly 3.

Hence, in this section, we give a reduction from a single known NP-hard problem, namely the vertex cover problem, such that our result holds μ:μ3\forall\mu\in\mathbb{Z}:\mu\geq 3 even when G𝒢\forall G\in\mathcal{G}, lGD=1l_{G}^{D}=1. Also, the reductions are designed to conform to the real-world stipulations: (i) each candidate attribute Ai,i[μ]A_{i},\forall i\in[\mu], partitions all the mm candidates into two or more groups and (ii) either no two attributes partition the candidates in the same way or if they do, the lower bounds across groups of the two attributes are not the same. For stipulation (ii), note that if two attributes partition the candidates in the same way and if the lower bounds across groups of the two attributes are also the same, then mathematically they are identical attributes that can be combined into one attribute. The next two theorems help us prove the statement in Corollary 1(3).

Theorem 3.

If μ:μ3\forall\mu\in\mathbb{Z}:\mu\geq 3 and μ\mu is an odd number, π=0\pi=0, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard, even when G𝒢\forall G\in\mathcal{G}, lGD=1l_{G}^{D}=1.

Proof.

We reduce an instance of vertex cover (VC) problem to an instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. We have one candidate cic_{i} for each vertex xiXx_{i}\in X, and m(2μ27μ+3)m\cdot(2\mu^{2}-7\mu+3) dummy candidates dDd\in D where mm corresponds to the number of vertices in the graph GG and μ\mu is a positive, odd integer (hint: the number of candidate attributes). Formally, we set AA = {c1,,cmc_{1},\dots,c_{m}} and the dummy candidate set DD = {d1,,dm(2μ27μ+3)d_{1},\dots,d_{m\cdot(2\mu^{2}-7\mu+3)}}. Hence, the candidate set CC = ADA\cup D is of size |C|=|C|= m+(m(2μ27μ+3))m+(m\cdot(2\mu^{2}-7\mu+3)) candidates. We set the target committee size to be k+mμ23mμk+m\mu^{2}-3m\mu.

Next, we have μ\mu candidate attributes. Each edge eEe\in E that connects vertices xix_{i} and xjx_{j} correspond to a candidate group G𝒢G\in\mathcal{G} that contains two candidates cic_{i} and cjc_{j}. As our reduction proceeds from a 3-regular graph, each vertex is connected to three edges. This corresponds to each candidate cAc\in A having three attributes and thus, belonging to three groups. Next, for each of the mm candidates cAc\in A, we have μ3\mu-3 blocks of dummy candidates and each block contains 2μ12\mu-1 dummy candidates dDd\in D. Thus, we have a total of m(μ3)(2μ1)m\cdot(\mu-3)\cdot(2\mu-1) dummy candidates, which equals to m(2μ27μ+3)m\cdot(2\mu^{2}-7\mu+3) dummy candidates. Next, each block of candidates contains 3 sets of candidates: Set T1T_{1} contains one candidate and Sets T2T_{2} and T3T_{3} contain μ1\mu-1 candidates each. Specifically, each of the μ3\mu-3 blocks for each candidate cAc\in A is constructed as follows:

  • Set T1T_{1} consists of single dummy candidate, d1T1T1d_{1}^{T_{1}}\in T_{1}.

  • Set T2T_{2} consists of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2} for all i[1,i\in[1, μ1]\mu-1].

  • Set T3T_{3} consists of μ1\mu-1 dummy candidates, djT3T3d_{j}^{T_{3}}\in T_{3} for all j[1,j\in[1, μ1]\mu-1].

Each candidate in the block has μ\mu attributes and are grouped as follows:

  • The dummy candidate d1T1T1d_{1}^{T_{1}}\in T_{1} is in the same group as candidate cAc\in A. It is also in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2}. Thus, the dummy candidate d1T1T1d_{1}^{T_{1}}\in T_{1} has μ\mu attributes and is part of μ\mu groups.

  • For each dummy candidate diT2T2d_{i}^{T_{2}}\in T_{2}, it is in the same group as d1T1d_{1}^{T_{1}} as described in the previous point. It is also in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, djT3T3d_{j}^{T_{3}}\in T_{3}. Thus, each dummy candidate diT2T2d_{i}^{T_{2}}\in T_{2} has μ\mu attributes and is part of μ\mu groups.

  • For each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3}, it is in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2}, as described in the previous point. Next, note that when μ\mu is an odd number, μ1\mu-1 is an even number, which means Set T3T_{3} has an even number of candidates. We randomly divide μ1\mu-1 candidates into two partitions. Then, we create μ12\frac{\mu-1}{2} groups over one attribute where each group contains two candidates from Set T3T_{3} such that one candidate is selected from each of the two partitions without replacement. Thus, each pair of groups is mutually disjoint. Thus, each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3} is part of exactly one group that is shared with exactly one another dummy candidate djT3T3d_{j^{\prime}}^{T_{3}}\in T_{3} where jjj\neq j^{\prime}. Overall, this construction results in one attribute and one group for each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3}. Hence, each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3} has μ\mu attributes and is part of μ\mu groups.

As a result of the above described grouping of candidates, each candidate cAc\in A also has μ\mu attributes and is part of μ\mu groups. Note that each candidate cAc\in A already had three attributes and was part of three groups due to our reduction from vertex cover problem on 3-regular graphs. Additionally, we added μ3\mu-3 blocks of dummy candidates and grouped candidate cAc\in A with candidate d1T1T1d_{1}^{T_{1}}\in T_{1} from each of the μ3\mu-3 blocks. Hence, each candidate cAc\in A has 3+(μ3)3+(\mu-3) attributes and is part of μ\mu groups. We set lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}, which corresponds that each vertex in the vertex cover should be covered by some chosen edge.

Finally, we introduce m+(m(2μ27μ+3))m+(m\cdot(2\mu^{2}-7\mu+3)) voters. For simplicity, let cic^{\prime}_{i} denote the ithi^{\text{th}} candidate in set CC. The first voter ranks the candidates based on their indices.

c1c2c3c2μ2m7μm+4mc^{\prime}_{1}\succ c^{\prime}_{2}\succ c^{\prime}_{3}\succ\dots\succ c^{\prime}_{2\mu^{2}m-7\mu m+4m}

The second voter improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

c2c3c2μ2m7μm+4mc1c^{\prime}_{2}\succ c^{\prime}_{3}\succ\dots\succ c^{\prime}_{2\mu^{2}m-7\mu m+4m}\succ c^{\prime}_{1}

Next, the third voter further improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

c3c4c2μ2m7μm+4mc1c2c^{\prime}_{3}\succ c^{\prime}_{4}\succ\dots\succ c^{\prime}_{2\mu^{2}m-7\mu m+4m}\succ c^{\prime}_{1}\succ c^{\prime}_{2}

Similarly, all the voters rank the candidates based on this method. Hence, the last voter will have the following ranking:

c2μ2m7μm+4mc1c2c2μ2m7μm+4m1c^{\prime}_{2\mu^{2}m-7\mu m+4m}\succ c^{\prime}_{1}\succ c^{\prime}_{2}\succ\dots\succ c^{\prime}_{2\mu^{2}m-7\mu m+4m-1}

Finally, there are no voter attributes, and hence, π=0\pi=0 and there are no representation constraints (lPR=ϕ)l^{R}_{P}=\phi). This completes our construction for the reduction, which is a polynomial time reduction in the size of nn and mm. Note that we assume that the number of candidate attributes μ\mu is always less than the number of candidates |C||C|. More specifically, our reduction holds when 3μ|C|23\leq\mu\leq|C|-2, which is a realistic assumption as we ideally expect μ\mu to be very small [9].

We first compute the score of the committee and then show the proof of correctness. When 𝚏\mathtt{f} is a monotone, separable scoring function, we know that

𝚏(W)=cW𝚏(c)\mathtt{f}(W)=\sum_{c\in W}\mathtt{f}(c)

Next, given a scoring vector 𝐬=(s1,s2,,s2μ2m7μm+4m)\mathbf{s}=(s_{1},s_{2},\dots,s_{2\mu^{2}m-7\mu m+4m}) where s1s_{1} is the score associated with candidate cc in the ranking of voter vv whose posv(c)=1\operatorname{pos}_{v}(c)=1 and so on, s1s2s2μ2m7μm+4ms_{1}\geq s_{2}\geq\dots\geq s_{2\mu^{2}m-7\mu m+4m} and s1>s2μ2m7μm+4ms_{1}>s_{2\mu^{2}m-7\mu m+4m}, the score of each candidate cCc\in C is

𝚏(c)=vVsposv(c)\mathtt{f}(c)=\sum_{v\in V}s_{\operatorname{pos}_{v}(c)}

but as each candidate occupies each of the m+(m(2μ27μ+3))m+(m\cdot(2\mu^{2}-7\mu+3)) positions once, 𝚏(c)\mathtt{f}(c) can be rewritten as

𝚏(c)=i=1|𝐬|si\mathtt{f}(c)=\sum_{i=1}^{|\mathbf{s}|}s_{i}

Hence, as all candidates cCc\in C have the same score, the score of each k+mμ23mμk+m\mu^{2}-3m\mu-sized committee W𝒲W\in\mathcal{W} will be the highest such that 𝚏(W)\mathtt{f}(W) is

𝚏(W)=cW𝚏(c)=cWi=1|𝐬|si=(k+mμ23mμ)i=1|𝐬|si\mathtt{f}(W)=\sum_{c\in W}\mathtt{f}(c)=\sum_{c\in W}\sum_{i=1}^{|\mathbf{s}|}s_{i}=(k+m\mu^{2}-3m\mu)\cdot\sum_{i=1}^{|\mathbf{s}|}s_{i}

Note that computing any highest scoring committee using a monotone, separable function takes time polynomial in the size of input.

For clarity w.r.t. to the score of the committee, consider the following example: W.l.o.g., if we assume that 𝚏\mathtt{f} is kk-Borda, then 𝐬=(m+(m(2μ27μ+3))1,,1,0)\mathbf{s}=(m+(m\cdot(2\mu^{2}-7\mu+3))-1,\dots,1,0). Hence, all candidates cCc\in C get the same Borda score 𝚏(c)\mathtt{f}(c) of

(m+(m(2μ27μ+3))1++1+0(m+(m\cdot(2\mu^{2}-7\mu+3))-1+\dots+1+0
=(m+(m(2μ27μ+3))1)(m+(m(2μ27μ+3)))2=\frac{(m+(m\cdot(2\mu^{2}-7\mu+3))-1)\cdot(m+(m\cdot(2\mu^{2}-7\mu+3)))}{2}
=4μ4m228μ3m2+65μ2m256μm2+16m22μ2m+7μm4m2=\frac{4\mu^{4}m^{2}-28\mu^{3}m^{2}+65\mu^{2}m^{2}-56\mu m^{2}+16m^{2}-2\mu^{2}m+7\mu m-4m}{2}

which is the sum of first m+(m(2μ27μ+3))1m+(m\cdot(2\mu^{2}-7\mu+3))-1 natural numbers, all the scores in the scoring vector of Borda rule. Therefore, each k+mμ23mμk+m\mu^{2}-3m\mu-sized committee will be the highest scoring committee W𝒲W\in\mathcal{W} with a 𝚏(W)\mathtt{f}(W) of

(k+mμ23mμ)4μ4m228μ3m2+65μ2m256μm2+16m22μ2m+7μm4m2(k+m\mu^{2}-3m\mu)\cdot\frac{4\mu^{4}m^{2}-28\mu^{3}m^{2}+65\mu^{2}m^{2}-56\mu m^{2}+16m^{2}-2\mu^{2}m+7\mu m-4m}{2}

Hence, the NP-hardness of the problem is due to finding a feasible committee that satisfies for all G𝒢G\in\mathcal{G}, |GW|lGD|G\cap W|\geq l^{D}_{G} where lGD=1l^{D}_{G}=1. Therefore, for the proof of correctness, we show the following:

Claim 1.

We have a vertex cover SS of size at most kk that satisfies eSϕe\cap S\neq\phi for all eEe\in E if and only if we have a committee WW of size at most k+mμ23mμk+m\mu^{2}-3m\mu that satisfies all the diversity constraints, which means that for all G𝒢G\in\mathcal{G}, |GW|lGD|G\cap W|\geq l^{D}_{G} which equals |GW|1|G\cap W|\geq 1 as lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}.

(\Rightarrow) If the instance of the VC problem is a yes instance, then the corresponding instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance as each and every candidate group will have at least one of their members in the winning committee WW, i.e., |GW|1|G\cap W|\geq 1 for all G𝒢G\in\mathcal{G}. Note that we have set lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}.

More specifically, for each block of candidates, we select one dummy candidate from Set T1T_{1} and all μ1\mu-1 dummy candidates from Set T3T_{3}. This helps to satisfy the condition |GW|1|G\cap W|\geq 1 for all candidate groups that contain at least one dummy candidate dDd\in D. Overall, we select μ\mu candidates from μ3\mu-3 blocks for each of the mm candidates that correspond to vertices in the vertex cover. This results in (μ(μ3)m)=mμ23mμ(\mu\cdot(\mu-3)\cdot m)=m\mu^{2}-3m\mu candidates in the committee. Next, for groups that do not contain any dummy candidates, select kk candidates cAc\in A that correspond to kk vertices xXx\in X that form the vertex cover. These candidates satisfy the constraints. Specifically, these kk candidates satisfy |GW|1|G\cap W|\geq 1 for all the candidate groups that do not contain any dummy candidates. Hence, we have a committee of size k+mμ23mμk+m\mu^{2}-3m\mu.

(\Leftarrow) The instance of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance when we have k+mμ23mμk+m\mu^{2}-3m\mu candidates in the committee. This means that each and every group will have at least one of their members in the winning committee WW, i.e., |GW|1|G\cap W|\geq 1 for all G𝒢G\in\mathcal{G}. Then the corresponding instance of the VC problem is a yes instance as well. This is because the kk vertices xXx\in X that form the vertex cover correspond to the kk candidates cAc\in A that satisfy |GW|1|G\cap W|\geq 1 for all the candidate groups that do not contain any dummy candidates. This completes the proof. ∎

Theorem 4.

If μ:μ3\forall\mu\in\mathbb{Z}:\mu\geq 3 and μ\mu is an even number, π=0\pi=0, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard, even when G𝒢\forall G\in\mathcal{G}, lGD=1l_{G}^{D}=1.

Proof.

We reduce an instance of vertex cover (VC) problem to an instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. We have two candidate cic_{i} and cm+ic_{m+i} for each vertex xiXx_{i}\in X, and 2m(2μ27μ+3)2m\cdot(2\mu^{2}-7\mu+3) dummy candidates dDd\in D where mm corresponds to the number of vertices in the graph GG and μ\mu is a positive, even integer (hint: the number of candidate attributes). Formally, we set AA = {c1,,cmc_{1},\dots,c_{m}} \cup {cm+1,,c2mc_{m+1},\dots,c_{2m}} and the dummy candidate set DD = {d1,,d2m(2μ27μ+3)d_{1},\dots,d_{2m\cdot(2\mu^{2}-7\mu+3)}}. Hence, the candidate set CC = ADA\cup D is of size |C|=|C|= 2m+(2m(2μ27μ+3))2m+(2m\cdot(2\mu^{2}-7\mu+3)) candidates. We set the target committee size to be 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu.

Next, we have μ\mu candidate attributes. Each edge eEe\in E that connects vertices xix_{i} and xjx_{j} correspond to two candidate groups G,G𝒢G,G^{\prime}\in\mathcal{G} such that group GG contains two candidates cic_{i} and cjc_{j} that correspond to vertices xix_{i} and xjx_{j} and the group GG^{\prime} contains two candidates cm+ic_{m+i} and cm+jc_{m+j} that also correspond to vertices xix_{i} and xjx_{j}. Note that by having 2m2m candidates in AA, we are in fact duplicating the graph HH. As our reduction proceeds from a 3-regular graph, each vertex is connected to three edges. This corresponds to each candidate cAc\in A having three attributes and thus, belonging to three groups. Next, for each candidate cAc\in A, we have μ3\mu-3 blocks of dummy candidates, each block containing 2μ12\mu-1 dummy candidates dDd\in D. Thus, we have a total of 2m(μ3)(2μ1)2m\cdot(\mu-3)\cdot(2\mu-1) dummy candidates, which equals to 2m(2μ27μ+3)2m\cdot(2\mu^{2}-7\mu+3) dummy candidates. Next, each block of candidates contains 3 sets of candidates: Set T1T_{1} contains one candidate and Sets T2T_{2} and T3T_{3} contain μ1\mu-1 candidates each. Specifically, each of the μ3\mu-3 blocks for each candidate cAc\in A is constructed as follows in line with the construction in the proof for Theorem 3:

  • Set T1T_{1} consists of single dummy candidate, d1T1T1d_{1}^{T_{1}}\in T_{1}.

  • Set T2T_{2} consists of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2} for all i[1,i\in[1, μ1]\mu-1].

  • Set T3T_{3} consists of μ1\mu-1 dummy candidates, djT3T3d_{j}^{T_{3}}\in T_{3} for all j[1,j\in[1, μ1]\mu-1].

Each candidate in the block has μ\mu attributes and are grouped as follows:

  • The dummy candidate d1T1T1d_{1}^{T_{1}}\in T_{1} is in the same group as candidate cAc\in A. It is also in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2}. Thus, the dummy candidate d1T1T1d_{1}^{T_{1}}\in T_{1} has μ\mu attributes and is part of μ\mu groups.

  • For each dummy candidate diT2T2d_{i}^{T_{2}}\in T_{2}, it is in the same group as d1T1d_{1}^{T_{1}} as described in the previous point. It is also in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, djT3T3d_{j}^{T_{3}}\in T_{3}. Thus, each dummy candidate diT2T2d_{i}^{T_{2}}\in T_{2} has μ\mu attributes and is part of μ\mu groups.

Note that the grouping of the candidates in Set T3T_{3} differs significantly from the construction in the proof for Theorem 3:

  • For each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3}, it is in μ1\mu-1 groups, individually with each of μ1\mu-1 dummy candidates, diT2T2d_{i}^{T_{2}}\in T_{2}, as described in the previous point. Next, note that when μ\mu is an even number, μ1\mu-1 is an odd number, which means Set T3T_{3} has an odd number of candidates. We randomly divide μ2\mu-2 candidates into two partitions. Then, we create μ22\frac{\mu-2}{2} groups over one attribute where each group contains two candidates from Set T3T_{3} such that one candidate is selected from each of the two partitions without replacement. Thus, each pair of groups is mutually disjoint. Hence, each dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3} is part of exactly one group that is shared with exactly one another dummy candidate djT3T3d_{j^{\prime}}^{T_{3}}\in T_{3} where jjj\neq j^{\prime}. Overall, this construction results in one attribute and one group for all but one dummy candidate djT3T3d_{j}^{T_{3}}\in T_{3}, which results into a total of μ\mu attributes and μ\mu groups for these μ2\mu-2 candidates. This is because μ22\frac{\mu-2}{2} groups can hold μ2\mu-2 candidates. Hence, one candidate still has μ1\mu-1 attributes and is part of μ1\mu-1 groups. If this block of dummy candidates is for candidate ciAc_{i}\in A, then another corresponding block of dummy candidates for candidate cm+iAc_{m+i}\in A will also have one candidate dzT3T3d_{z}^{T^{\prime}_{3}}\in T^{\prime}_{3} who will have μ1\mu-1 attributes and is part of μ1\mu-1 groups. We group these two candidates from separate blocks. Hence, now that one remaining candidate also has μ\mu attributes and is part of μ\mu groups. As there is always an even number of candidates in set AA (|A|=2m|A|=2m), such cross-block grouping of candidates among a total of (μ3)2m(\mu-3)\cdot 2m blocks, also an even number, is always possible.

As a result of the above described grouping of candidates, each candidate cAc\in A also has μ\mu attributes and is part of μ\mu groups. Note that each candidate cAc\in A already had three attributes and was part of three groups due to our reduction from vertex cover problem on 3-regular graphs. Additionally, we added μ3\mu-3 blocks of dummy candidates and grouped candidate cAc\in A with candidate d1T1T1d_{1}^{T_{1}}\in T_{1} from each of the μ3\mu-3 blocks. Hence, each candidate cAc\in A has 3+(μ3)3+(\mu-3) attributes and is part of μ\mu groups. We set lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}, which corresponds that each vertex in the vertex cover should be covered by some chosen edge.

Finally, we introduce 2m+(2m(2μ27μ+3))2m+(2m\cdot(2\mu^{2}-7\mu+3)) voters, in line with our reduction in proof of Theorem 3. For simplicity, let cic^{\prime}_{i} denote the ithi^{\text{th}} candidate in set CC. The first voter ranks the candidates based on their indices.

c1c2c3c4μ2m14μm+8mc^{\prime}_{1}\succ c^{\prime}_{2}\succ c^{\prime}_{3}\succ\dots\succ c^{\prime}_{4\mu^{2}m-14\mu m+8m}

The second voter improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

c2c3c4μ2m14μm+8mc1c^{\prime}_{2}\succ c^{\prime}_{3}\succ\dots\succ c^{\prime}_{4\mu^{2}m-14\mu m+8m}\succ c^{\prime}_{1}

Similarly, all the voters rank the candidates based on this method. Hence, the last voter will have the following ranking:

c4μ2m14μm+8mc1c2c4μ2m14μm+8m1c^{\prime}_{4\mu^{2}m-14\mu m+8m}\succ c^{\prime}_{1}\succ c^{\prime}_{2}\succ\dots\succ c^{\prime}_{4\mu^{2}m-14\mu m+8m-1}

Finally, there are no voter attributes, and hence, π=0\pi=0 and there are no representation constraints (lPR=ϕ)l^{R}_{P}=\phi). This completes our construction for the reduction, which is a polynomial time reduction in the size of nn and mm. Note that we assume that the number of candidate attributes μ\mu is always less than the number of candidates |C||C|. More specifically, our reduction holds when 3μ|C|23\leq\mu\leq|C|-2, which is a realistic assumption as we ideally expect μ\mu to be very small [9].

We first compute the score of the committee and then show the proof of correctness. When 𝚏\mathtt{f} is a monotone, separable scoring function, we know that

𝚏(W)=cW𝚏(c)\mathtt{f}(W)=\sum_{c\in W}\mathtt{f}(c)

Next, given a scoring vector 𝐬=(s1,s2,,s4μ2m14μm+8m)\mathbf{s}=(s_{1},s_{2},\dots,s_{4\mu^{2}m-14\mu m+8m}) where s1s_{1} is the score associated with candidate cc in the ranking of voter vv whose posv(c)=1\operatorname{pos}_{v}(c)=1 and so on, s1s2s4μ2m14μm+8ms_{1}\geq s_{2}\geq\dots\geq s_{4\mu^{2}m-14\mu m+8m} and s1>s4μ2m14μm+8ms_{1}>s_{4\mu^{2}m-14\mu m+8m}, the score of each candidate cCc\in C is

𝚏(c)=vVsposv(c)\mathtt{f}(c)=\sum_{v\in V}s_{\operatorname{pos}_{v}(c)}

but as each candidate occupies each of the 2m+(2m(2μ27μ+3))2m+(2m\cdot(2\mu^{2}-7\mu+3)) positions once, 𝚏(c)\mathtt{f}(c) can be rewritten as

𝚏(c)=i=1|𝐬|si\mathtt{f}(c)=\sum_{i=1}^{|\mathbf{s}|}s_{i}

Hence, as all candidates cCc\in C have the same score, the score of each 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu-sized committee W𝒲W\in\mathcal{W} will be the highest such that 𝚏(W)\mathtt{f}(W) is

𝚏(W)=cW𝚏(c)=cWi=1|𝐬|si=(2k+2mμ26mμ)i=1|𝐬|si\mathtt{f}(W)=\sum_{c\in W}\mathtt{f}(c)=\sum_{c\in W}\sum_{i=1}^{|\mathbf{s}|}s_{i}=(2k+2m\mu^{2}-6m\mu)\cdot\sum_{i=1}^{|\mathbf{s}|}s_{i}

Note that computing any highest scoring committee using a monotone, separable function takes time polynomial in the size of input.

For clarity w.r.t. to the score of the committee, consider the following example: W.l.o.g., if we assume that 𝚏\mathtt{f} is kk-Borda, then 𝐬=(2m+(2m(2μ27μ+3))1,,1,0)\mathbf{s}=(2m+(2m\cdot(2\mu^{2}-7\mu+3))-1,\dots,1,0). Hence, all candidates cCc\in C get the same Borda score 𝚏(c)\mathtt{f}(c) of

(2m+(2m(2μ27μ+3))1++1+0(2m+(2m\cdot(2\mu^{2}-7\mu+3))-1+\dots+1+0
=(2m+(2m(2μ27μ+3))1)(2m+(2m(2μ27μ+3)))2=\frac{(2m+(2m\cdot(2\mu^{2}-7\mu+3))-1)\cdot(2m+(2m\cdot(2\mu^{2}-7\mu+3)))}{2}
=8μ4m256μ3m2+130μ2m2112μm2+32m22μ2m+7μm4m=8\mu^{4}m^{2}-56\mu^{3}m^{2}+130\mu^{2}m^{2}-112\mu m^{2}+32m^{2}-2\mu^{2}m+7\mu m-4m

which is the sum of first 2m+(2m(2μ27μ+3))12m+(2m\cdot(2\mu^{2}-7\mu+3))-1 natural numbers, all the scores in the scoring vector of Borda rule. Therefore, each 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu-sized committee will be the highest scoring committee W𝒲W\in\mathcal{W} with a 𝚏(W)\mathtt{f}(W) of

(2k+2mμ26mμ)(8μ4m256μ3m2+130μ2m2112μm2+32m22μ2m+7μm4m)(2k+2m\mu^{2}-6m\mu)\cdot(8\mu^{4}m^{2}-56\mu^{3}m^{2}+130\mu^{2}m^{2}-112\mu m^{2}+32m^{2}-2\mu^{2}m+7\mu m-4m)

Hence, the NP-hardness of the problem is due to finding a feasible committee that satisfies for all G𝒢G\in\mathcal{G}, |GW|lGD|G\cap W|\geq l^{D}_{G} where lGD=1l^{D}_{G}=1. Therefore, for the proof of correctness, we show the following:

Claim 2.

We have a vertex cover SS of size at most kk that satisfies eSϕe\cap S\neq\phi for all eEe\in E if and only if we have a committee WW of size at most 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu that satisfies all the diversity constraints, which means that for all G𝒢G\in\mathcal{G}, |GW|lGD|G\cap W|\geq l^{D}_{G} which equals |GW|1|G\cap W|\geq 1 as lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}.

(\Rightarrow) If the instance of the VC problem is a yes instance, then the corresponding instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance as each and every candidate group will have at least one of their members in the winning committee WW, i.e., |GW|1|G\cap W|\geq 1 for all G𝒢G\in\mathcal{G}. Note that we have set lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G}.

More specifically, for each block of candidates, we select one dummy candidate from Set T1T_{1} and all μ1\mu-1 dummy candidates from Set T3T_{3}. This helps to satisfy the condition |GW|1|G\cap W|\geq 1 for all candidate groups that contain at least one dummy candidate dDd\in D. Overall, we select μ\mu candidates from μ3\mu-3 blocks for each of the 2m2m candidates that correspond to vertices in the vertex cover. This results in (μ(μ3)2m)=2mμ26mμ(\mu\cdot(\mu-3)\cdot 2m)=2m\mu^{2}-6m\mu candidates in the committee. Next, for groups that do not contain any dummy candidates, select 2k2k candidates cAc\in A that correspond to kk vertices xXx\in X that form the vertex cover. These candidates satisfy the constraints. Specifically, these 2k2k candidates satisfy |GW|1|G\cap W|\geq 1 for all the candidate groups that do not contain any dummy candidates. Hence, we have a committee of size 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu.

(\Leftarrow) The instance of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance when we have 2k+2mμ26mμ2k+2m\mu^{2}-6m\mu candidates in the committee. This means that each and every group will have at least one of their members in the winning committee WW, i.e., |GW|1|G\cap W|\geq 1 for all G𝒢G\in\mathcal{G}. Then the corresponding instance of the VC problem is a yes instance as well. This is because the kk vertices xXx\in X that form the vertex cover correspond to the 2k2k candidates cAc\in A that satisfy |GW|1|G\cap W|\geq 1 for all the candidate groups that do not contain any dummy candidates. We remind that we had constructed 2m2m candidates in the instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem that correspond to mm vertices in the VC problem, which means that we need 2k2k candidates instead of kk candidates to satisfy diversity constraints for candidate groups that do not contain any dummy candidates. This completes the proof. ∎

5.2.2 (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD w.r.t. representation constraints

We now study the computational complexity of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD due to the presence of voter attributes. Note that the reduction is designed to conform to the real-world stipulations that are analogous to the stipulations for the candidate attributes. The following theorem helps us prove the statement in Corollary 1(4).

Theorem 5.

If μ=0\mu=0, π:π1\forall\pi\in\mathbb{Z}:\pi\geq 1, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard, even when P𝒫\forall P\in\mathcal{P}, lPR=1l_{P}^{R}=1.

Proof.

We reduce an instance of vertex cover (VC) problem to an instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. We have one candidate cic_{i} for each vertex xiXx_{i}\in X, and nmn\cdot m dummy candidates dDd\in D where nn corresponds to the number of edges and mm corresponds to the number of vertices in the graph GG. Formally, we set AA = {c1,,cmc_{1},\dots,c_{m}} and the dummy candidate set DD = {d1,,dnmd_{1},\dots,d_{nm}}. Hence, the candidate set CC = ADA\cup D consists of m+(nm)m+(n\cdot m) candidates. We set the target committee size to be kk.

We now introduce n2n^{2} voters, nn voters for each edge eEe\in E. More specifically, an edge eEe\in E connects vertices xix_{i} and xjx_{j}. Then, the corresponding nn voters vVv\in V rank the candidates in the following collection of sets 𝒯=(T1\mathcal{T}=(T_{1}, T2T_{2}, T3T_{3}, T4)T_{4}) such that T1T2T3T4T_{1}\succ T_{2}\succ T_{3}\succ T_{4}:

  • Set T1T_{1}: candidates cic_{i} and cjc_{j} that correspond to vertices xix_{i} and xjx_{j} are ranked at the top two positions, ordered based on their indices. For atha^{\text{th}} voter where a[n]a\in[n], we denote the candidates cic_{i} and cjc_{j} as ciac_{i_{a}} and cjac_{j_{a}}.

  • Set T2T_{2}: mm out of (nmn\cdot m) dummy candidates are ranked in the next mm positions, again ordered based on their indices. For each voter, these mm candidates are distinct as shown below. Hence, for all pairs of voters v,vV:vvv,v^{\prime}\in V:v\neq v^{\prime}, we know that T2vT2v=ϕT_{2}^{v}\cap T_{2}^{v^{\prime}}=\phi.

  • Set T3T_{3}: the next m2m-2 positions are occupied by the remaining m2m-2 candidates in ASet T1A\setminus\text{Set }T_{1} that correspond to the vertices in graph GG, ordered based on their indices.

  • Set T4T_{4}: the last (n1)m(n-1)\cdot m positions are occupied by the remaining (n1)m(n-1)\cdot m dummy candidates in DSet T2D\setminus\text{Set }T_{2}, ordered based on their indices.

More specifically, the voters rank the candidates as shown below:

Voters Set T1T_{1} \succ Set T2T_{2} \succ Set T3T_{3} \succ Set T4T_{4}
v11v_{1}^{1}, …, v1nv_{1}^{n} ci1cj1c_{i_{1}}\succ c_{j_{1}} \succ d1d_{1} \succ d2d_{2} \succ \dots \succ dmd_{m} \succ A{ci1,cj1}A\setminus\{c_{i_{1}},c_{j_{1}}\} \succ D{d1,,dm}D\setminus\{d_{1},\dots,d_{m}\}
v21v_{2}^{1}, …, v2nv_{2}^{n} ci2cj2c_{i_{2}}\succ c_{j_{2}} \succ dm+1d_{m+1} \succ dm+2d_{m+2} \succ \dots \succ d2md_{2m} \succ A{ci2,cj2}A\setminus\{c_{i_{2}},c_{j_{2}}\} \succ D{dm+1,,d2m}D\setminus\{d_{m+1},\dots,d_{2m}\}
v31v_{3}^{1}, …, v3nv_{3}^{n} ci3cj3c_{i_{3}}\succ c_{j_{3}} \succ d2m+1d_{2m+1} \succ d2m+2d_{2m+2} \succ \dots \succ d3md_{3m} \succ A{ci3,cj3}A\setminus\{c_{i_{3}},c_{j_{3}}\} \succ D{d2m+1,,d3m}D\setminus\{d_{2m+1},\dots,d_{3m}\}
vn1v_{n}^{1}, …, vnnv_{n}^{n} cincjnc_{i_{n}}\succ c_{j_{n}} \succ d(n1)m+1d_{(n-1)m+1} \succ d(n1)m+2d_{(n-1)m+2} \succ \dots \succ dnmd_{nm} \succ A{cin,cjn}A\setminus\{c_{i_{n}},c_{j_{n}}\} \succ D{d(n1)m+1,,dnm}D\setminus\{d_{(n-1)m+1},\dots,d_{nm}\}

Next, there are no candidate attributes, and hence, μ=0\mu=0 and there are no diversity constraints (lGD=ϕ)l^{D}_{G}=\phi). The voters are divided into disjoint population over one or more attributes when π,π1\forall\pi\in\mathbb{Z},\pi\geq 1. Specifically, the voters are divided into populations as follows: x[π]\forall x\in[\pi], y[n]\forall y\in[n], z[n]\forall z\in[n], voter vyzVv_{y}^{z}\in V is part of a population P𝒫P\in\mathcal{P} such that PP contains all voters with the same zmodxz\mod x and yy. Each voter is part of π\pi populations. We set the representation constraint to 1. Hence, lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}. The winning committee WPW_{P} for each population P𝒫P\in\mathcal{P} will always consist of the top kk-ranked candidates in the ranking of the voters in population PP, which means that WPW_{P}, P𝒫\forall P\in\mathcal{P}, can not contain candidates from Set T3T_{3} and Set T4T_{4}. This is because, by construction, (a) the ranking of all voters within a population vPv\in P, for all P𝒫P\in\mathcal{P}, is the same and (b) the first kk candidates of each population will only get selected because either (i) they will indeed be the highest scoring candidates for the population or (ii) in case of a tie, they get precedence because we break ties based on the indices of candidates such that cic_{i} gets precedence over cjc_{j} for all i<ji<j.

This completes our reduction, which is a polynomial time reduction in the size of nn and mm. For the proof of correctness, we show the following:

Claim 3.

We have a vertex cover SS of size at most kk that satisfies eSϕe\cap S\neq\phi for all eEe\in E if and only if we have at least one committee WW of size at most kk that satisfies all the representation constraints, which means that for all P𝒫P\in\mathcal{P}, |WPW|lPR|W_{P}\cap W|\geq l^{R}_{P} which equals |WPW|1|W_{P}\cap W|\geq 1 as lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}.

(\Rightarrow) If the instance of the VC problem is a yes instance, then the corresponding instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance as each and every population’s winning committee, WPW_{P} for all P𝒫P\in\mathcal{P}, will have at least one of their members in the winning committee WW, i.e., |WPW|1|W_{P}\cap W|\geq 1 for all P𝒫P\in\mathcal{P}. Indeed, even had the winning committee of each population been of size 2 instead of kk, the instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD will be a yes instance as the vertex cover corresponds to the winning committee representing each and every population as |WPW|1|W_{P}\cap W|\geq 1 for all P𝒫P\in\mathcal{P}.

(\Leftarrow) The instance of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance when each and every population’s winning committee, WPW_{P} for all P𝒫P\in\mathcal{P}, will have at least one of their members in the winning committee WW, i.e., |WPW|1|W_{P}\cap W|\geq 1 for all P𝒫P\in\mathcal{P}. Then the corresponding instance of the VC problem is a yes instance as well. More specifically, there are two cases when the instance of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD can be a yes instance:

  • Case 1 - When only the candidates from Set T1T_{1} are in the committee WW: An instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD when μ=0\mu=0 and π=1\pi=1 is a yes instance when each and every population has at least one representative in the committee, i.e., |WPW|1|W_{P}\cap W|\geq 1 for all P𝒫P\in\mathcal{P}. We note that for all P𝒫P\in\mathcal{P}, each population’s winning committee WPW_{P} consists of two candidates from Set T1T_{1} and top k2k-2 candidates from Set T2T_{2}. Hence, when the winning committee WW consists of only the candidates from Set T1T_{1} of the ranking of each and every voter vVv\in V, it implies that it will be a yes instance, which in turn, implies that there is a vertex cover of size at most kk that covers all the edges eEe\in E because the vertices in vertex cover xSx\in S correspond to the candidates in the winning committee cWc\in W.

  • Case 2 - When candidates from Set T1T_{1} and Set T2T_{2} are in the committee WW: In Case 1, we showed that if a candidate cc in the winning committee WW is from Set T1T_{1}, then it corresponds to a vertex in the vertex cover. Additionally, as the population’s winning committee WPW_{P} for all P𝒫P\in\mathcal{P} is of size kk, an instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD can be a yes instance even if a dummy candidate from Set T2T_{2} is in the winning committee WW. More specifically, there are two sub-cases:

    • for some population P𝒫P\in\mathcal{P}, dummy candidate dd from Set T2T_{2} AND candidate cc from from Set T1T_{1} are in the committee WW: if a population’s candidate cc from Set T1T_{1}, who is also in WPW_{P}, is in WW, then this sub-case is equivalent to Case 1, and hence, a corresponding vertex in the vertex cover vSv\in S exists. We note that this sub-case does not allow for any of population to have a representative from WPW_{P} in WW only from Set T2T_{2}, which is our next sub-case.

    • for some population P𝒫P\in\mathcal{P}, only dummy candidate dd from Set T2T_{2} is in the committee WW: if for a given population P𝒫P\in\mathcal{P}, a committee WW represents the population via only the dummy candidate dd who is in a population’s winning committee dWPd\in W_{P}, then the representation constraint lPR=|WPW|=1l^{R}_{P}=|W_{P}\cap W|=1 is satisfied as WPW={d}W_{P}\cap W=\{d\}. However, for all pairs of voters v,vV:vvv,v^{\prime}\in V:v\neq v^{\prime}, we know that T2vT2v=ϕT_{2}^{v}\cap T_{2}^{v^{\prime}}=\phi. Hence, we can replace any such dummy candidate dWPd\in W_{P} with a candidate cWPc\in W_{P} as that candidate dd can not be representing any other population P𝒫PP^{\prime}\in\mathcal{P}\setminus P. Formally, a winning committee WW is always tied777W.l.o.g., we make a subtle assumption that all m+(nm)m+(n\cdot m) candidates bring the same utility to the committee WW. The aim to make this assumption is to show that even under this assumption, the problem remains hard, which is to say that even finding a feasible committee that simply satisfies the constraints is NP-hard even when we have π=1\pi=1. The assumption does not change the composition of each population’s winning committee WPW_{P} for all P𝒫P\in\mathcal{P}. to another winning committee WW^{\prime} where WW^{\prime}=(W{d}){c}(W\setminus\{d\})\cup\{c\} where {c,d}WP\{c,d\}\in W_{P} for some P𝒫P\in\mathcal{P}. This is equivalent to saying that we are replacing candidate dd from Set T2T_{2} with a candidate cc from Set T1T_{1} of the population PP. Thus, a yes instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD due to WW, or due to the equivalent committee WW^{\prime}, in this sub-case corresponds to a vertex cover SS that covers all the edges eEe\in E.

These cases complete the other direction of the proof of correctness.

Finally, we note that for this reduction and the proof of correctness, we assume the ties are broken using a predecided order of candidates. We also note that as we are using a separable committee selection rule, computing scores of candidates takes polynomial time. This completes the overall proof. ∎

Corollary 2.

If μ:μ0\forall\mu\in\mathbb{Z}:\mu\geq 0, π:π1\forall\pi\in\mathbb{Z}:\pi\geq 1, and 𝚏\mathtt{f} is a monotone, separable function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard, even when G𝒢\forall G\in\mathcal{G}, lGD=1l_{G}^{D}=1 and P𝒫\forall P\in\mathcal{P}, lPR=1l_{P}^{R}=1.

The reduction in the proof of Theorem 5 holds π:π1\forall\pi\in\mathbb{Z}:\pi\geq 1 as each voter in the reduction can belong to more than one population. Next, as focus of this section was to understand the computational complexity with respect to representation constraints, we ease the stipulation that required each candidate attribute to partition all candidates into more than two groups. Hence, for each candidate attribute AiA_{i}, i[μ]\forall i\in[\mu], we simply create one group that consists of all the candidates and set lGD=1l_{G}^{D}=1 for all G𝒢G\in\mathcal{G} and the problem still remains NP-hard.

5.2.3 (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD w.r.t. submodular scoring function

Chamberlin-Courant (CC) rule is a well-known monotone, submodular scoring function [9], which we use for our proof. The novelty of our reduction is that it holds for determining the winning committee using CC rule that uses any positional scoring rule with scoring vector 𝐬={s1,,sm}\mathbf{s}=\{s_{1},\dots,s_{m}\} such that s1=s2s_{1}=s_{2}, sm0s_{m}\geq 0, and i[3,m1],si\forall i\in[3,m-1],s_{i}\in\mathbb{Z} : sisi+1s_{i}\geq s_{i+1} and s2>sis_{2}>s_{i}.

The following theorem and corollary proves the statement in Corollary 1(1).

Theorem 6.

If 𝚏\mathtt{f} is a monotone, submodular function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard even when μ=0\mu=0 and π=0\pi=0.

Proof.

We reduce an instance of vertex cover (VC) problem to an instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. Each candidate ciCc_{i}\in C corresponds to a vertex xiXx_{i}\in X. For each edge eEe\in E, we have a voter vVv\in V whose complete linear order is as follows: the top two most preferred candidates correspond to the two vertices connected by an edge ee. These two candidates are ranked based on their indices. The remaining m2m-2 candidates are ranked in the bottom m2m-2 positions, again based on their indices. We set the committee size to kk. This is a polynomial time reduction in the size of nn and mm.

For the proof of correctness, we note that there are no candidate and voter attributes, and thus, no diversity and representation constraints. Hence, we show the following:

Claim 4.

We have a vertex cover SS of size at most kk that satisfies eSϕe\cap S\neq\phi for all eEe\in E if and only if we have a committee WW of size at most kk with total misrepresentation of zero, which means that at least one of the top 2 ranked candidates of each voter is in the committee WW.

()\Rightarrow) If the instance of the VC problem is a yes instance, then the corresponding instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance as each and every voter will have at least one of their top two candidates in the committee and this will result in a misrepresentation score of zero as s1=s2s_{1}=s_{2} and i[3,m],si<s2\forall i\in[3,m],s_{i}<s_{2}.

(\Leftarrow) If the instance of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is a yes instance, then the VC is also a yes instance. When a committee WW does not represent a voter’s one of the top-2 candidates, it implies that the dissatisfaction is greater than zero. Hence, for each voter vVv\in V that is not represented, there exists an edge eEe\in E that is not covered. Hence, we can say that (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard with respect to μ=0\mu=0, π=0\pi=0 and 𝚏=\mathtt{f}= submodular function. ∎

Corollary 3.

If μ:μ0\forall\mu\in\mathbb{Z}:\mu\geq 0, π:π0\forall\pi\in\mathbb{Z}:\pi\geq 0, and 𝚏\mathtt{f} is a monotone, submodular function, then (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard, even when G𝒢\forall G\in\mathcal{G}, lGD=1l_{G}^{D}=1 and P𝒫\forall P\in\mathcal{P}, lPR=1l_{P}^{R}=1.

The proof of Theorem 6 shows that when we use a submodular but not separable committee selection rule 𝚏\mathtt{f}, (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard even when μ=0\mu=0 and π=0\pi=0. Next, as focus of this section was to understand the computational complexity with respect to monotone submodular scoring rule, we ease the stipulation that required each candidate attribute to partition all candidates into more than two groups and required each voter attribute to partition all voters into more than two population. The problem remains hard even when we have candidate attributes and the diversity constraints are set to one and have voter population and the representation constraints set to one. Specifically, for each candidate attribute, create one group that contains all the candidates and for each voter attribute, create one population that contains all the voters. This is analogous to not having any candidate or voter attributes. Hence, even when lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G} and lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}, (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is NP-hard if 𝚏\mathtt{f} is submodular committee selection rule.

6 Inapproximability and Parameterized Complexity

Result Parameter (3\geq 3,0) (0,1\geq 1) (1\geq 1,1\geq 1)
inapproximability - (1ε)(1-\varepsilon)\cdot (lnμ𝒪(lnlnμ))(\ln\mu-\mathcal{O}(\ln\ln\mu)) (Thm. 7) kεk-\varepsilon (Thm. 9)888For Theorem 9, we assume that the Unique Games Conjecture (UGC) [54] holds, specifically as the result that showed pseudorandom sets in the Grassmann graph have near-perfect expansion completed the proof of 2-to-2 Games Conjecture [55], which is considered to be a significant evidence towards proving the UGC. Moreover, GapUG(12\frac{1}{2}, ε\varepsilon) is found to be NP-hard, i.e., a weaker version of the UGC holds with completeness 12\frac{1}{2} (See [56] and “Evidence towards the Unique Games Conjecture” in [55] for more details). Without the assumption on UGC, the result for our problem when μ=0\mu=0 and π1\pi\geq 1 will change and for arbitrarily small constant ε>0\varepsilon>0, the problem is inapproximable within a factor of k1εk-1-\varepsilon for every integer k3k\geq 3 [57] and within a factor of 2ε\sqrt{2}-\varepsilon when k=2k=2 [55, 58]. (1ε)(1-\varepsilon)\cdot ln\ln (|𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|) (Thm. 8)
parameterized kk is constant 𝒪(mk(|𝒢|+|𝒫|))\mathcal{O}(m^{k}\cdot(|\mathcal{G}|+|\mathcal{P}|)) (Obs. 3)
complexity kmk\ll m W[2]-hard (Cor. 4) 𝒪(ck+m)\mathcal{O}(c^{k}+m) (Thm. 11) W[2]-hard (Cor. 5)
Table 2: A summary of inapproximability and parameterized complexity of (μ\mu, π\pi)-DRCF. The value in brackets of the header row represent the values of μ\mu and π\pi, respectively, such that results hold for all μ\mu\in\mathbb{Z} and all π\pi\in\mathbb{Z} that satisfy the condition stated in the brackets. The results are under the assumption P \neq NP. ‘Thm.’ denotes Theorem. ‘Obs.’ denotes Observation. ‘Cor.’ denotes Corollary. ε\varepsilon denotes an arbitrarily small constant such that ε>0\varepsilon>0 and the results are meant to hold for every such ε>0\varepsilon>0.

The hardness of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD is mainly due to the hardness of (μ\mu, π\pi)-DRCF, which is to say that satisfying the diversity and representation constraints is computationally hard, even when all constraints are set to 1. Formally, the hardness remains even when lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G} and lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}. Hence, in this section, we focus on the hardness of approximation to understand the limits of how well we can approximate (μ\mu, π\pi)-DRCF and focus on parameterized complexity of (μ\mu, π\pi)-DRCF.

It is natural to try to reformulate representation constraints as diversity constraints. However, in our model, it is not possible to do so as each candidate attribute partitions all mm candidates into groups and the lower bound is set such that lGD[1,min(k,|G|)]l_{G}^{D}\in[1,\min(k,|G|)] for all G𝒢G\in\mathcal{G}. However, for representation constraints, WPW_{P}, for all P𝒫P\in\mathcal{P}, contains only kk candidates and the remainder mkm-k candidates consisting of CWPC\setminus W_{P}, for all P𝒫P\in\mathcal{P}, may never be selected. Hence, representation constraints can not be easily reformulated to diversity constraints. Moreover, even if we relax the lower bound of the diversity constraint to lGD[0,min(k,|G|)]l_{G}^{D}\in[0,\min(k,|G|)] instead of lGD[1,min(k,|G|)]l_{G}^{D}\in[1,\min(k,|G|)], for all G𝒢G\in\mathcal{G}, to allow for such a reformulation, the following settings of (μ\mu, π\pi)-DRCF and (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD are technically different and we may not carry out any reformulations amongst each other:

  • Using only diversity constraints

  • Using only representation constraints

  • Using both, diversity and representation, constraints

The above listed settings are technically different from each other as the sizes of candidate groups and the size of the winning committees of populations have implications on our approach to solve a problem. For instance, using both, diversity and representation, constraints and using only representation constraints are mathematically as different as the vertex cover problem on hypergraphs and the vertex cover problem on kk-uniform hypergraphs, respectively. The differences between the hardness of approximation for the latter two problems is well-known. Overall, while reformulations such as converting representation constraints to diversity constraints do not impact the computational complexity of the problem, it affects the approximation and parameterized complexity results. Hence, we study the hardness of approximation and the parameterized complexity of the above listed settings of (μ\mu, π\pi)-DRCF in detail without carrying out any reformulations between the different settings of the constraints.

Observation 2.

μ\forall\mu\in\mathbb{Z} and π\forall\pi\in\mathbb{Z}, the following settings of the (μ\mu, π\pi)-DRCF problem are not equivalent: (i) μ\mu=0 and π1\pi\geq 1, (ii) μ3\mu\geq 3 and π=0\pi=0, and (iii) μ1\mu\geq 1 and π1\pi\geq 1.

6.1 Inapproximability

In this subsection, we focus on allowing size violation as deciding on which constraints to violate is not straightforward, especially as constraints are linked to human groups. Hence, we define the size optimization version of (μ\mu, π\pi)-DRCF and study its inapproximability:

Definition 8.

(μ\mu, π\pi)-DRCF-size-optimization: In the (μ\mu, π\pi)-DRCF-size-optimization problem, given a set CC of mm candidates, a set VV of nn voters such that each voter viv_{i} has a preference list vi\succ_{v_{i}} over mm candidates, a committee size k[m]k\in[m], a set of candidate groups 𝒢\mathcal{G} and the corresponding diversity constraints lGDl^{D}_{G} for all G𝒢G\in\mathcal{G}, and a set of voter populations 𝒫\mathcal{P} and the corresponding representation constraints lPRl^{R}_{P} and the winning committees WPW_{P} for all P𝒫P\in\mathcal{P}, find a minimum-size committee WCW\subseteq C such that WW satisfies all the diversity and representation constraints, i.e., |GW|lGD|G\cap W|\geq l^{D}_{G} for all G𝒢G\in\mathcal{G} and |WPW|lPR|W_{P}\cap W|\geq l^{R}_{P} for all P𝒫P\in\mathcal{P}, respectively.

Theorem 7.

For ε>0\varepsilon>0, μ\forall\mu\in\mathbb{Z} : μ3\mu\geq 3, and π=0\pi=0, (μ\mu, π\pi)-DRCF-size-optimization problem is inapproximable within (1ε)(1-\varepsilon)\cdot (lnμ𝒪(lnlnμ))(\ln\mu-\mathcal{O}(\ln\ln\mu)), even when lGDl^{D}_{G} = 1 \forall G𝒢G\in\mathcal{G}.

Proof.

We reduce from the set multi-cover problem with sets of bounded size, a known NP-hard problem [52], to (μ\mu, 0)-DRCF-size-optimization problem.

More specifically, given a set XX = {v1,,v|𝒢|}\{v_{1},...,v_{|\mathcal{G}|}\}, and a collection of mm sets SiS_{i} \subseteq XX such that |Si|μ|S_{i}|\leq\mu, the goal is to choose some sets of minimum cardinality covering each element viv_{i}.

Then, we construct a (μ\mu, 0)-DRCF-size-optimization instance. To do so, we have a corresponding candidate cic_{i} for each set SiS_{i}, and a corresponding group G𝒢G\in\mathcal{G} which is equal to {cj:viSj}\{c_{j}:v_{i}\in S_{j}\} for each element viv_{i}. Hence there are mm candidates and |𝒢||\mathcal{G}| candidate groups such that each candidate belongs to at most μ\mu groups. The diversity constraints lGDl^{D}_{G} are set to be equal to 1, which corresponds to the requirement that each element is covered.

This is an approximation-preserving reduction for all μ3\mu\geq 3 and π=0\pi=0. Hence, the minimum cardinality of the constrained set cover problem is at most kk if and only if an at most kk-sized feasible committee exists. Given that set multi-cover problem is inapproximable within (1ε)(1-\varepsilon)\cdot (lnμ𝒪(lnlnμ))(\ln\mu-\mathcal{O}(\ln\ln\mu)) [59], so is our (33, 0)-DRCF-size-optimization problem. We note that this result holds for (μ\mu, π\pi)-DRCF-size-optimization problem for all μ:μ3\mu\in\mathbb{Z}:\mu\geq 3 and π=0\pi=0. ∎

While the above proof is similar in flavor to the one given in Theorem 7 (Hardness of feasibility with committee violations) of Celis et al. [9], we note that our inapproximability ratio differs from their inapproximability ratio of (1ε)(1-\varepsilon)\cdot ln\ln (|𝒢||\mathcal{G}|). This is because our ratio exploits the candidate structure where each candidate is bounded by the number of attributes μ\mu, which bounds the number of groups they can be a part of. Hence, our reduction is from set cover problem where each set is of bounded size.

For our next result, we first give a reduction from regular hitting set (HS) to (11, 11)-DRCF. Next, as the regular HS problem is equivalent to the minimum set cover problem [60], the latter’s inapproximability [61] holds for our problem.

Theorem 8.

For ε>0\varepsilon>0, μ\forall\mu\in\mathbb{Z} : μ1\mu\geq 1, and π\forall\pi\in\mathbb{Z} : π1\pi\geq 1, (μ\mu, π\pi)-DRCF-size-optimization problem is inapproximable within a factor of (1ε)(1-\varepsilon)\cdot ln\ln (|𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|), even when lGDl^{D}_{G} = 1 \forall G𝒢G\in\mathcal{G} and lPRl^{R}_{P} = 1 \forall P𝒫P\in\mathcal{P}.

Proof.

We reduce from regular hitting set (HS), a known NP-hard problem [52], to (1, 1)-DRCF-size-optimization problem.

An instance of HS consists of a universe UU = {x1,x2,,xm}\{x_{1},x_{2},\dots,x_{m}\} and a collection 𝒵\mathcal{Z} of subsets of UU, each of size \in [1,m][1,m]. The objective is to find a subset SUS\subseteq U of size at most kk that ensures for all T𝒵T\in\mathcal{Z}, |ST|1|S\cap T|\geq 1.

We construct the (11, 11)-DRCF-size-optimization instance as follows. For each element xx in the universe UU, we have the candidate cc in the candidate set CC. For each subset TT in collection 𝒵\mathcal{Z}, we either have candidate group G𝒢G\in\mathcal{G} or winning committee WPW_{P} of population P𝒫P\in\mathcal{P}. Note that we have |𝒢|+|𝒫|=|𝒵||\mathcal{G}|+|\mathcal{P}|=|\mathcal{Z}|. We set lGD=1l^{D}_{G}=1 for all G𝒢G\in\mathcal{G} and lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}, which means |WG||W\cap G|\geq1 and |WWP||W\cap W_{P}|\geq1, respectively. This corresponds to the requirement that |ST|1|S\cap T|\geq 1.

Hence, we have a subset SS of size at most kk that satisfies |ST|1|S\cap T|\geq 1 if and only if we have a committee WW of size at most kk that satisfies |WG||W\cap G|\geq1 for all G𝒢G\in\mathcal{G} and |WWP||W\cap W_{P}|\geq1 for all P𝒫P\in\mathcal{P}.

We note that this is also an approximation-preserving reduction for all μ1\mu\geq 1 and π1\pi\geq 1. Given that minimum set cover problem, which is equivalent to the hitting set problem, is inapproximable within (1ε)(1-\varepsilon)\cdot ln\ln (|𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|) [61], so is our (11, 11)-DRCF-size-optimization problem. We note that this result holds for (μ\mu, π\pi)-DRCF-size-optimization problem μ:μ1\forall\mu\in\mathbb{Z}:\mu\geq 1 and π:π1\forall\pi\in\mathbb{Z}:\pi\geq 1. ∎

Assuming the Unique Games Conjecture [54], Bansal and Khot [62] showed that vertex cover problem on kk-uniform hypergraphs, for any integer k2k\geq 2, is inapproximable within kεk-\varepsilon, even when the kk-uniform hypergraph is almost kk-partite. We use this result for our next theorem.

Theorem 9.

For ε>0\varepsilon>0, μ=0\mu=0, and π\forall\pi\in\mathbb{Z} : π1\pi\geq 1, (μ\mu, π\pi)-DRCF-size-optimization problem, assuming the Unique Games Conjecture [54], is inapproximable within kεk-\varepsilon, even when lPR=1l^{R}_{P}=1 \forall P𝒫P\in\mathcal{P}.

Proof.

We give a reduction from vertex cover problem on kk-uniform hypergraphs to (0, 11)-DRCF.

An instance of vertex cover problem on kk-uniform hypergraphs consists of a set of vertices XX = {x1,x2,,xm}\{x_{1},x_{2},\dots,x_{m}\} and a set of nn hyperedges SS, each connecting exactly kk vertices from XX. A vertex cover XXX^{\prime}\subseteq X is a subset of vertices such that each edge contains at least one vertex from XX (i.e. sXϕs\cap X^{\prime}\neq\phi for each edge sSs\in S). The vertex cover problem on kk-uniform hypergraphs is to find a vertex cover XX^{\prime} of size at most dd.

We construct the (0, 11)-DRCF instance as follows. For each vertex xXx\in X, we have the candidate cCc\in C. For each edge sSs\in S, we have a population’s winning committee WPW_{P} of size kk for all P𝒫P\in\mathcal{P}. Note that we have |𝒫|=|S||\mathcal{P}|=|S|. We set lPR=1l^{R}_{P}=1 for all P𝒫P\in\mathcal{P}, which means |WWP||W\cap W_{P}|\geq1. This corresponds to the requirement that sXϕs\cap X^{\prime}\neq\phi.

Hence, we have a vertex cover XX^{\prime} of size at most dd if and only if we have a committee WW of size at most dd that satisfies |WWP||W\cap W_{P}|\geq1 for all P𝒫P\in\mathcal{P}.

This is an approximation-preserving reduction for μ=0\mu=0 and for all π1\pi\geq 1. Given that the vertex cover problem on kk-uniform hypergraphs is inapproximable within kεk-\varepsilon [62] assuming the Unique Games Conjecture, so is our (0, 11)-DRCF-size-optimization problem. We note that this result holds for (μ\mu, π\pi)-DRCF-size-optimization problem for all μ=0\mu=0 and π:π1\pi\in\mathbb{Z}:\pi\geq 1. ∎

In addition to this general inapproximability result, we informally conjecture that improve upon the ratio of kεk-\varepsilon.

Conjecture 1.

[Informal] If μ=0\mu=0 and π,π1\forall\pi\in\mathbb{Z},\pi\geq 1, then (μ\mu, π\pi)-DRCF-size-optimization problem can be approximated to at most k(1o(1))k(k1)lnlng(ϕ)lng(ϕ)k-(1-o(1))\frac{k(k-1)\ln\ln g(\phi)}{\ln g(\phi)} using a polynomial time algorithm.

A proof of above conjecture implies that there exists a polynomial time approximation algorithm for the (μ\mu, π\pi)-DRCF-size-optimization problem (μ=0\mu=0 and π1\pi\geq 1) with approximation ratio at most k(1o(1))k(k1)lnlng(ϕ)lng(ϕ)k-(1-o(1))\frac{k(k-1)\ln\ln g(\phi)}{\ln g(\phi)} where g(ϕ)g(\phi) is a function that maps the cohesiveness of the preferences ϕ\phi to the maximum number of winning committees WPW_{P} that a candidate can belong to. Specifically, if such a g(ϕ)g(\phi) exists and if π=1\pi=1, then the stated approximation ratio exists directly due to Halperin [63].

6.2 Parameterized Complexity

In most real-world elections, the committee size kk is constant. Hence, our first result here is inspired by the parameterized complexity results in this field [38, 41].

Observation 3.

The (μ\mu, π\pi)-DRCF problem can be solved in 𝒪(mk(|𝒢|+|𝒫|))\mathcal{O}(m^{k}\cdot(|\mathcal{G}|+|\mathcal{P}|)). If kk is a constant, then it is a polynomial time algorithm.

We select a set of committees 𝒲\mathcal{W}, each of size kk, and then check for the satisfiability of the constraints for each committee W𝒲W\in\mathcal{W}. It is easy to see that 𝒲\mathcal{W} has (mk)\binom{m}{k} committees, that is, |𝒲|mk|\mathcal{W}|\leq m^{k}. Checking whether a committee W𝒲W\in\mathcal{W} satisfies all the constraints takes 𝒪(|𝒢|+|𝒫|)\mathcal{O}(\mathcal{|G|+|P|}), which is the total number of constraints to be checked. Hence, we can solve (μ\mu, π\pi)-DRCF in time polynomial in mm and nn, given kk is constant.

Next, when the committee size (kk) is not a constant, the rate of growth of the number of candidates to be elected may be much slower than the number of candidates (kmk\ll m).

Theorem 10.

[60, 64] The regular hitting set problem with unbounded subset size is W[2]-hard w.r.t. kk.

Corollary 4.

If μ\forall\mu\in\mathbb{Z} : μ3\mu\geq 3 and π=0\pi=0, then (μ\mu, π\pi)-DRCF problem is W[2]-hard w.r.t. kk and the hardness holds even when lGDl^{D}_{G} = 1 \forall G𝒢G\in\mathcal{G}.

Proof.

In the proof for Theorem 7, we gave a reduction from minimum set cover problem to (μ\mu, π\pi)-DRCF problem w.r.t. μ\mu and π\pi for all μ\mu\in\mathbb{Z} : μ3\mu\geq 3 and π=0\pi=0. Additionally, we know that the minimum set cover problem has a well-known one to one relationship with the hitting set problem with no restriction on the subset size [60, 65, 66]. Hence, as regular HS with unbounded size of subsets is W[2]-hard [60], our results here follow due to the one to one relationship between the regular HS problem and the minimum set cover problem. ∎

Corollary 5.

If μ\forall\mu\in\mathbb{Z} : μ1\mu\geq 1 and π\forall\pi\in\mathbb{Z} : π1\pi\geq 1, then (μ\mu, π\pi)-DRCF problem is W[2]-hard w.r.t. kk and the hardness holds even when lGDl^{D}_{G} = 1 \forall G𝒢G\in\mathcal{G} and lPRl^{R}_{P} = 1 \forall P𝒫P\in\mathcal{P}.

When μ1,π1\mu\geq 1,\pi\geq 1, our problem is equivalent to regular HS (Theorem 8).

Theorem 11.

If μ=0\mu=0, π\forall\pi\in\mathbb{Z}: π1\pi\geq 1, and lPR=1,P𝒫l^{R}_{P}=1,\forall P\in\mathcal{P}, then (μ\mu, π\pi)-DRCF problem can be solved using an 𝒪(ck+m)\mathcal{O}(c^{k}+m) time algorithm where c=d1+𝒪(d1)c=d-1+\mathcal{O}(d^{-1}) and d=kd=k. If kmk\ll m, then it is a polynomial time algorithm.

Proof.

Proof of Theorem 9 shows that our problem is equivalent to kk-HS when μ=0\mu=0 and π1\pi\geq 1. Hence, our algorithm here is motivated from bounded tree search algorithm in Section 6 of [60] where they showed that when kk is small, a dd-hitting set problem, which upper bounds the cardinality of every element in the subsets to be hit to dd, can be solved using an 𝒪(ck+m)\mathcal{O}(c^{k}+m) time algorithm with c=d1+𝒪(d1)c=d-1+\mathcal{O}(d^{-1}). In our case, d=kd=k. We have modified our algorithm from [60] to return all committees that satisfy the representation constraints.

Algorithm 1 Parameterized Polynomial-time Algorithm for Representation Constraints

Input: CC, kk, and WPW_{P} and lPRl^{R}_{P} for all P𝒫P\in\mathcal{P}
Output: WW : |WWP|1|W\cap W_{P}|\geq 1 for all P𝒫P\in\mathcal{P}

1:C=C{y}C=C\setminus\{y\} : x,yC\forall x,y\in C, if yWPy\in W_{P}, then xWPx\in W_{P}, for all P𝒫P\in\mathcal{P}
2:for each WPW_{P} do
3:  Create kk branches, one each for each ciWPc_{i}\in W_{P}
4:  Choose c1c_{1} for the hitting set and choose that c1c_{1} is not in the hitting set, but cic_{i} is for all i[2,k]i\in[2,k]

In the above algorithm, steps 3 and 4 creates kk branches in total. Hence, if the number of leaves in a branching tree is bkb_{k}, then the first branch has at most bk1b_{k-1} leaves. Next, let bkb^{\prime}_{k} be the number of leaves in a branching tree where there is at least one set of size k1k-1 or smaller. For each i[2,k]i\in[2,k], there is some committee WPW_{P} in the given collection such that c1WPc_{1}\in W_{P}, but ciWPc_{i}\notin W_{P}. Therefore, the size of WPW_{P} is at most k1k-1 after excluding c1c_{1} from and including cic_{i} in the committee WW. Altogether we get bkbk1+(k1)bk1b_{k}\leq b_{k-1}+(k-1)b^{\prime}_{k-1}.

If there is already a set with at most k1k-1 elements, we can repeat the above steps and get bkbk1+(k2)bk1b^{\prime}_{k}\leq b_{k-1}+(k-2)b^{\prime}_{k-1}. The branching number of this recursion is cc from above, and note that it is always smaller than k1+𝒪(k1)k-1+\mathcal{O}(k^{-1}). ∎

As a conclusion of our theoretical analyses, we make an interesting observation: When π=0\pi=0, (μ\mu, π\pi)-DRCF becomes NP-hard when μ=3\mu=3. On the other hand, when μ=0\mu=0, (μ\mu, π\pi)-DRCF becomes NP-hard even when π=1\pi=1. This means that introducing representation constraints makes the problem hard “faster” than introducing diversity constraints. In contrast, with respect to the parameter kk, the former case is W[2]-hard and the latter is fixed parameter tractable for all π:π1\pi\in\mathbb{Z}:\pi\geq 1. This reinforces our claim that even if it may seem natural to try and reformulate representation constraints as diversity constraints, we should not do so as the size of candidate groups and the size of winning committee of voter populations has implications on how one may try to solve the problem efficiently.

7 Heuristic Algorithm

In the previous sections, we saw that our model, which is useful from the social choice theory perspective to have more “fairer” elections, is computationally hard and it is hard even when we parameterize the problem on the size of the committee. Hence, we take a pragmatic approach to evaluate if our model is efficient in practice. We do so by developing a two-stage heuristic-based algorithm, in part motivated from the literature on distributed constraint satisfaction [67], which allow us to efficiently compute DiRe committees in practice.

We develop a heuristic-based algorithm as the use of integer linear program formulation in multiwinner elections is not efficient [39], especially when using the Monroe rule. Moreover, in addition to the known temporal efficiency of using a heuristic approach as compared to a linear programming approach, our empirical evaluation shows that the algorithm returns an optimal solution (discussed later in Section 8.3.1), thus overcoming one of the biggest disadvantages of using a heuristic approach.

7.1 DiReGraphs

We represent an instance of the (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem from Figure 1 as a DiReGraph (Figure 2). The constraints are represented by quadrilaterals and candidates by ellipses. More specifically, there are candidates (Level B) and the DiRe committee (Level D). Next, there is a global committee size constraint (Level A) and unary constraints that lower bound the number of candidates required from each candidate group or voter population (Level C). Edges connecting candidates (Level B) to unary constraints (Level C) depends on the candidate’s membership in a candidate group or a population’s winning committee. The idea behind DiReGraph is to have a “network flow” from A to D such that all nodes on level C are visited. More specifically, the aim is to select kk candidates (Level A) from mm candidates (Level B) such that the in-flow at the unary constraint nodes (Level C) is equal to the specified diversity or representation constraint. A node is said to have an in-flow of τ\tau when τ\tau candidates in the committee WW are part of the group/winning population. Formally, τ=|WG|\tau=|W\cap G| for each candidate group G𝒢G\in\mathcal{G} and τ=|WWP|\tau=|W\cap W_{P}| for each population P𝒫P\in\mathcal{P}. When the last condition is fulfilled, there will be a DiRe committee (Level D).

Example 2.

Creating DiReGraph: Consider the election setup shown in Figure 1. The candidate c2c_{2} (Figure 1) is a male who is in winning committees of both the states, namely California and Illinois. Hence, c2c_{2} in DiReGraph (Figure 2) is connected with the three sets of constraints, one each for male and the two states, namely CA (California) and IL (Illinois).

Refer to caption
Figure 2: (μ\mu, π\pi)-DRCF as DiReGraph. (A) Global committee size constraint and (C) the diversity/representation constraints connected by edges with (B) the candidates and (D) the DiRe committee.
Algorithm 2 DiRe Committee Feasibility Algorithm

Input:
variables XX = {X1,,X|𝒢|+|𝒫|X_{1},\dots,X_{\mathcal{|G|}+\mathcal{|P|}}}
domain DD = (D1,,D|𝒢|+|𝒫|D_{1},\dots,D_{\mathcal{|G|}+\mathcal{|P|}}) : each DiD_{i} is GG\in 𝒢\mathcal{G} or WP:PW_{P}:P\in 𝒫\mathcal{P}
unary constraints SS = {S1,,S|𝒢|+|𝒫|S_{1},\dots,S_{\mathcal{|G|}+\mathcal{|P|}}} : each SiS_{i} is lGDl^{D}_{G} for each G𝒢G\in\mathcal{G} or lPRl^{R}_{P} for each P𝒫P\in\mathcal{P}
Output:
set 𝒲\mathcal{W} of committees : W𝒲\forall W\in\mathcal{W}, |WDi|Si|W\cap D_{i}|\geq S_{i}

1: Create DiReGraph DiReGDiReG
2:SGSG = subgraph of nodes on levels B & C of DiReGDiReG
3:SCCSCC = strongly connected components of SGSG
4:for each compicomp_{i}, compjcomp_{j} \in SCCSCC do
5:  for each XuX_{u} = {XiX_{i}\cup XjX_{j}} : XiX_{i}\in compicomp_{i} and XjX_{j}\in compjcomp_{j} do
6:   if !𝚙𝚊𝚒𝚛𝚠𝚒𝚜𝚎_𝚏𝚎𝚊𝚜𝚒𝚋𝚕𝚎\mathtt{pairwise\_feasible}(XuX_{u}, DD, SS) return false
7:for each compcomp in SCCSCC do
8:  XSCCX_{SCC} = list of XiX_{i} for each SiS_{i} at level C of compcomp
9:  if !𝚙𝚊𝚒𝚛𝚠𝚒𝚜𝚎_𝚏𝚎𝚊𝚜𝚒𝚋𝚕𝚎\mathtt{pairwise\_feasible}(XSCCX_{SCC}, DD, SS) return false
10: Recreate DiReGraph DiReGDiReG using reduced domain
11:return 𝚑𝚎𝚞𝚛𝚒𝚜𝚝𝚒𝚌_𝚋𝚊𝚌𝚔𝚝𝚛𝚊𝚌𝚔\mathtt{heuristic\_backtrack}({}, DiReGDiReG, XX, DD, SS)

7.2 DiRe Committee Feasibility Algorithm

Algorithm 2 has two stages: (i) preprocessing reduces the search space used to satisfy the constraints and efficiently finds infeasible instances, and (ii) heuristic-based search of candidates decreases the number of steps needed either to find a feasible committee, or return infeasibility.

7.2.1 Create DiReGraph

The first step of the algorithm is to create the DiReGraph based on the variables that are given as the input. We have the following input: variables XX = {X1,,X|𝒢|+|𝒫|X_{1},\dots,X_{\mathcal{|G|}+\mathcal{|P|}}} are represented by the nodes on Level C. The domain DD = (D1,,D|𝒢|+|𝒫|D_{1},\dots,D_{\mathcal{|G|}+\mathcal{|P|}}) of these variables are represented by edges that connect the node on Level C to the nodes (candidates) on Level B. Formally, for each DiDD_{i}\in D where DiD_{i} is GG\in 𝒢\mathcal{G} or WP:PW_{P}:P\in 𝒫\mathcal{P}, we have an edge ee that connect node on Level B with node on Level C. The constraints SS = {S1,,S|𝒢|+|𝒫|S_{1},\dots,S_{\mathcal{|G|}+\mathcal{|P|}}} correspond to the diversity and representation constraints. Formally, for each SiSS_{i}\in S, SiS_{i} is lGDl^{D}_{G} for each G𝒢G\in\mathcal{G} or lPRl^{R}_{P} for each P𝒫P\in\mathcal{P}.

Example 3.

Input Variables: XX = {Male, Female, CA, IL}

DD = ({c1,c2c_{1},c_{2}}, {c3c4c_{3}c_{4}}, {c1,c2c_{1},c_{2}}, {c2,c4c_{2},c_{4}})

SS = {1, 1, 1, 1}

Algorithm 3 Pairwise Feasibility Algorithm

function 𝚙𝚊𝚒𝚛𝚠𝚒𝚜𝚎_𝚏𝚎𝚊𝚜𝚒𝚋𝚕𝚎\mathtt{pairwise\_feasible}(XX, DD, SS) returns false if an inconsistency is found, or true

1:queuequeue = (XiX_{i}, XjX_{j}) : XiX_{i}, XjX_{j} in XX and XiXjX_{i}\neq X_{j}
2:while queuequeue is not empty do
3:  (XiX_{i}, XjX_{j}) = 𝚛𝚎𝚖𝚘𝚟𝚎_𝚏𝚒𝚛𝚜𝚝\mathtt{remove\_first}(queuequeue)
4:  if |DiDj||D_{i}\cap D_{j}| <Si+Sjk<S_{i}+S_{j}-k return false
5:  if 𝚍𝚘𝚖𝚊𝚒𝚗_𝚛𝚎𝚍𝚞𝚌𝚎\mathtt{domain\_reduce}(XX, DD, CC, XiX_{i}, XjX_{j}then
6:   if |Di|=0|D_{i}|=0 return false
7:   for  each XxX_{x}\in XX \setminus {Xi,Xj\{X_{i},X_{j}do
8:    add (XxX_{x}, XiX_{i}) to queuequeue
9:return true

function 𝚍𝚘𝚖𝚊𝚒𝚗_𝚛𝚎𝚍𝚞𝚌𝚎\mathtt{domain\_reduce}(XX, DD, SS, XiX_{i}, XjX_{j}) returns true iff the domain DiD_{i} of XiX_{i} is reduced

1:domain_reduceddomain\_reduced = falsefalse
2:for  each dDid\in D_{i} do
3:  if all SiS_{i} sized combinations from DiD_{i} containing dd does not satisfy pairwise constraints with all SjS_{j} sized combinations from DjD_{j} then
4:   DiD_{i} = Di{d}D_{i}\setminus\{d\}
5:   domain_reduceddomain\_reduced = truetrue
6:return domain_reduceddomain\_reduced
Algorithm 4 Heuristic Backtracking Algorithm

function 𝚑𝚎𝚞𝚛𝚒𝚜𝚝𝚒𝚌_𝚋𝚊𝚌𝚔𝚝𝚛𝚊𝚌𝚔\mathtt{heuristic\_backtrack}(solutionsolution, DiReGDiReG, XX, DD, SS) returns a solution or infeasibility

1:XiX_{i}.inFlow = 0
2:if |solution|k|solution|\leq k AND each XiX_{i}.inFlow \geq SiS_{i} return solutionsolution
3:local_Xlocal\_X = 𝚜𝚎𝚕𝚎𝚌𝚝_𝚞𝚗𝚜𝚊𝚝𝚒𝚜𝚏𝚒𝚎𝚍_𝚟𝚊𝚛𝚒𝚊𝚋𝚕𝚎\mathtt{select\_unsatisfied\_variable}(DiReGDiReG, XX, DD, SS)
4:for each local_candlocal\_cand in 𝚜𝚘𝚛𝚝_𝚌𝚊𝚗𝚍𝚒𝚍𝚊𝚝𝚎𝚜\mathtt{sort\_candidates}(local_Xlocal\_X, solutionsolution, DiReGDiReG, XX, DD, SSdo
5:  if local_candlocal\_cand is consistent with solutionsolution then
6:   tupletuple = {local_X\{local\_X = local_Xlocal\_X.append(local_candlocal\_cand)}
7:   solutionsolution = solutionsolution \cup tupletuple
8:   local_Xlocal\_X.inFlow = local_Xlocal\_X.inFlow + 1
9:   resultresult = 𝚑𝚎𝚞𝚛𝚒𝚜𝚝𝚒𝚌_𝚋𝚊𝚌𝚔𝚝𝚛𝚊𝚌𝚔\mathtt{heuristic\_backtrack}(solutionsolution, DiReGDiReG, XX, DD, SS)
10:   if resultresult \neq infeasibility return resultresult
11:  tupletuple = {local_Xlocal\_X = local_Xlocal\_X.remove(local_candlocal\_cand)}
12:  solutionsolution = solutionsolution \setminus tupletuple
13:  local_Xlocal\_X.inFlow = local_Xlocal\_X.inFlow - 1
14:return infeasibility

function 𝚜𝚎𝚕𝚎𝚌𝚝_𝚞𝚗𝚜𝚊𝚝𝚒𝚜𝚏𝚒𝚎𝚍_𝚟𝚊𝚛𝚒𝚊𝚋𝚕𝚎\mathtt{select\_unsatisfied\_variable}(DiReGDiReG, XX, DD, SS)

1:return XiX_{i} with the lowest |Di|/max((SiXi.inFlow),1)\nicefrac{{|D_{i}|}}{{\max((S_{i}-X_{i}\text{.inFlow}),1)}} ratio

function 𝚜𝚘𝚛𝚝_𝚌𝚊𝚗𝚍𝚒𝚍𝚊𝚝𝚎𝚜\mathtt{sort\_candidates}(local_Xlocal\_X, solutionsolution, DiReGDiReG, XX, DD, SS)

1:return candidates DiD_{i} sorted in decreasing order of their out degree in DiReGDiReG

7.2.2 Preprocessing

Find the strongly connected components SCCSCC of a graph in time linear in the size of mm and |𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|, equivalent to mm and nn in real-world settings. The next step is find inter- and intra-component pairwise feasibility. We note that we only do a pairwise feasibility test as previous work has shown that doing a three-way, a four-way or greater feasibility tests increase the computational time significantly without improving the scope of finding a group of variables whose combination guarantees an infeasible instance [67].

Inter-component pairwise feasibility:

Select two variables XiX_{i}, XjX_{j} corresponding to constraints SiS_{i}, SjS_{j} on level C of DiReGraph, one each from different components of SCCSCC. Do a pairwise feasibility check for each pair and return infeasibility if any one pair of variables can not return a valid committee. The correctness and completeness of this step is easy. If there are more constraints than the available candidates, it is impossible to find a feasible solution. Also, if a pair of constraints are pairwise infeasible, then it is clear that they will remain infeasible overall.

Intra-component pairwise feasibility:

Repeat the above procedure but now, within a component. This step also helps in returning infeasibililty efficiently.

Reducing domain:

Based on empirical evidence of the previous work that used a setting similar to ours, pairwise infeasibility causes a majority of overall infeasible instances [67]. Hence, if a committee did exist, the domain of each variable is reduced by removing candidates that explicitly do not help to find feasible committees.

Now do a restricted version of intra-component pairwise feasibility. If algorithm reaches this stage, we know that all of the constraints are pairwise feasible due to presence of at least one solution. Hence, reduce the domain by removing a candidate who, when included in the solution, always returns pairwise infeasible solution with another constraint. Specifically, fix a candidate cc from the domain of a variable XiX_{i} and do a pairwise feasibility check with other domain XjX_{j} across all possible solutions that contains the candidate cc. If all solutions that contain cc result in infeasibility, then remove candidate cc from the domain of XiX_{i}.

7.2.3 Heuristic Backtracking.

Use depth-first search for backtracking. Specifically, choose one variable XiX_{i} at a time, and backtrack when XiX_{i} has no legal values left to satisfy the constraint. This technique repeatedly chooses an unassigned variable, and then tries all values in its domain, trying to find a solution. If an infeasibility is returned, traverse back by one step and move forward by trying another value.

Select unsatisfied variable:

Use the “minimum-remaining-values (MRV)” heuristic to choose the variable having the fewest legal values. This heuristic picks a variable that is most likely to cause a failure soon, thereby pruning the search tree. For example, if some variable XiX_{i} has no legal values left, the MRV heuristic will select XiX_{i} and infeasibility will be returned, in turn, avoiding additional searches.

Sort most favorite candidates:

Use the “most-favorite-candidates (MFC)” heuristic to sort the candidates in domain DiD_{i} such that a candidate on level B of DiReGraph who is most connected to level C (out-degree) is ranked the highest. This heuristic tries to reduce the branching factor on future choices by selecting the candidate that is involved in the largest number of constraints.

Overall, the aim is to select the most favorite candidates into the committee as they help satisfy the highest proportion of constraints. For completeness and to get multiple DiRe committees, after sorting step, use a “shift-left” approach where the second candidate becomes the first, the first becomes the last, and so on. This allows us to get multiple DiRe committees.

Example 4.

Sorting candidates: In Figure 2, the ordering of candidates will be c2c_{2}, c1c_{1}, c4c_{4}, and c3c_{3} as c2c_{2} has out-degree of 3, c1c_{1} and C4C_{4} has 2, and c3c_{3} has 1. Ties are broken randomly.

We now give an example to explain the entire algorithm.

Example 5.

Implementation of Algorithm: Consider the election setup shown in Figure 1, which consists of four candidates and four voters, each having one attribute.

The input to the algorithm is (i) a set of variables (candidate group names and voter population names) = {Male, Female, CA, IL}. (ii) a collection of sets of domain for each variable (candidates part of the candidate groups and winning committee of each population) = ({c1,c2c_{1},c_{2}}, {c3,c4c_{3},c_{4}}, {c1,c2c_{1},c_{2}}, {c2,c4c_{2},c_{4}}). (iii) a collection of constraints for each variable = (1, 1, 1, 1).

The first step of the algorithm is to create a DiReGraph as shown in Figure 2. Level A is set to 2 as k=2k=2. Level B consists of four nodes, each representing one candidate. Level C consists of four nodes, which is equal to |𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|, the number of candidate groups and voter populations in the election. Level D consists of the final output. Each node on Level B is connected with Level A and each node on Level C is connected to Level D. The candidate c1c_{1} is a male who is in winning committees of California. Hence, c1c_{1} in DiReGraph is connected with the two sets of constraints, one each for male and CA (California).

Next, a subgraph SGSG consisting of eight nodes from Levels B and C and the corresponding edges that connect these eight nodes is created.

As there is only one strongly connected components in the SG, we directly check for the intra-component pairwise feasibility. For each pair of domains, there is always a feasible committee that exists. Hence, the algorithm continues to execute. Moreover, none of the domains get reduced as the constraints are set to one.

Before reaching the final step of the algorithm, the algorithm would have terminated if no feasible committee existed that satisfied all the pairwise constraints.

In the final step, the 𝚜𝚎𝚕𝚎𝚌𝚝_𝚞𝚗𝚜𝚊𝚝𝚒𝚜𝚏𝚒𝚎𝚍_𝚟𝚊𝚛𝚒𝚊𝚋𝚕𝚎\mathtt{select\_unsatisfied\_variable} function selects a candidate at random as all the variables have the same ratio of 2 for |Di|/(SiXi.inFlow)\nicefrac{{|D_{i}|}}{{(S_{i}-X_{i}\text{.inFlow})}} as |Di||D_{i}|=2 and SiS_{i}=1 for all DiDD_{i}\in D and SiSS_{i}\in S. Next, for each variable that remains, we check whether adding that variable violates the global constraint (committee size on Level A of DiReGraph) or not. We keep on backtracking till we either find a committee or exhaustively navigate through our pruned search space. To get more than one committee, rerun the 𝚑𝚎𝚞𝚛𝚒𝚜𝚝𝚒𝚌_𝚋𝚊𝚌𝚔𝚝𝚛𝚊𝚌𝚔\mathtt{heuristic\_backtrack} function by applying an additional “left-shift” operation on the result of the 𝚜𝚘𝚛𝚝_𝚌𝚊𝚗𝚍𝚒𝚍𝚊𝚝𝚎𝚜\mathtt{sort\_candidates} function each time the 𝚑𝚎𝚞𝚛𝚒𝚜𝚝𝚒𝚌_𝚋𝚊𝚌𝚔𝚝𝚛𝚊𝚌𝚔\mathtt{heuristic\_backtrack} function is implemented. We note that this increases the time complexity of the algorithm linearly in the size of 𝒢\mathcal{G} and 𝒫\mathcal{P}.

8 Empirical Analysis

We now empirically assess the efficiency of our heuristic-based algorithm using real and synthetic datasets. We also assess the effect of enforcing diversity and representation constraints on the feasibility and utility of the winning committee selected using different scoring rules.

8.1 Datasets

8.1.1 Real Datasets

RealData 1:

The Eurovision dataset [68] consists of 26 countries ranking the songs performed by each of the 10 finalist countries. We aim to select a 5-sized DiRe committee. Each candidate, a song performed by a country, has two attributes, the European region and the language of the song performed. Each voter has one attribute, the voter’s European region. Specifically for the European region attribute, Australia and Israel were labeled as “Others” as they are not a part of Europe.

RealData 2:

The United Nations Resolutions dataset [69] consists of 193 UN member countries voting for 81 resolutions presented in the UN General Assembly in 2014. We aim to select a 12-sized DiRe committee. Each candidate has two attributes, the topic of the resolution and whether a resolution was a significant vote or not. Each voter has one attribute, the continent.

8.1.2 Synthetic Datasets

SynData 1:

We set committee size (kk) to 6 for 100 voters and 50 candidates. We generate complete preferences using RSM by setting selection probability Πi,j\Pi_{i,j} to replicate Mallows’ [70] model (ϕ=0.5\phi=0.5, randomly chosen reference ranking σ\sigma of size mm) (Theorem 3, [71]) and preference probability p(i)=1p(i)=1, i[m]\forall i\in[m].

Dividing Candidates into Groups and Voters into Populations: To assess the impact of enforcing constraints, we generate datasets with varying number of candidate and voter attributes by iteratively choose a combination of (μ(\mu, π)\pi) such that μ\mu and π{0,1,2,3,4}\pi\in\{0,1,2,3,4\}. For each candidate attribute, we choose a number of non-empty partitions q[2q\in[2, k]k], uniformly at random. Then to partition CC, we randomly sort the candidates CC and select q1q-1 positions from [2[2, m]m], uniformly at random without replacement, with each position corresponding to the start of a new partition. The partition a candidate is in is the attribute group it belongs to. For each voter attribute, we repeat the above procedure, replacing CC with VV, and choosing q1q-1 positions from the set [2[2, n]n]. For each combination of (μ,π)(\mu,\pi), we generate five datasets. We limit the number of candidate groups and number of voter populations per attribute to kk to simulate a real-world division of candidates and voters.

SynData 2:

We use the same setting as SynData 1, except we fix μ\mu and π\pi each to 2 and vary the cohesiveness of voters by setting selection probability Πi,j\Pi_{i,j} to replicate Mallows’ [70] model’s ϕ\phi \in [0.1[0.1, 1]1], with increments of 0.1. We divide the candidates into groups and voters into populations in line with SynData 1.

8.2 Setup

System.

We note that all our experiments were run on a personal machine without requiring the use of any commercial or paid tools. More specifically, we used a controlled virtual environment using Docker(R) on a 2.2 GHz 6-Core Intel(R) Core i7 Macbook Pro(R) @ 2.2GHz with 16 GB of RAM running MacOS Big Sur (v11.1). We used Python 3.7.

Constraints.

For each G𝒢G\in\mathcal{G}, we choose lGDl_{G}^{D} \in [1,min(k[1,\min(k, |G|)]|G|)] uniformly at random. For each P𝒫P\in\mathcal{P}, we choose lPRl_{P}^{R} \in [1[1, k]k] uniformly at random.

Voting Rules.

We use previously defined kk-Borda, β\beta-CC, and Monroe rules. More specifically, we have two rules from from submodular, monotone class of committee selection rule due to inherent difference in their method of computing committees and one rule from separable, monotone class of functions. We deem these to be sufficient due to our focus on the study of (μ\mu, π\pi)-DRCF as discussed later.

Refer to caption
(a) timed out instances
Refer to caption
(b) mean running time
Figure 3: Using SynData 1, (a) Proportion (in %) of instances that timed out at 2000 seconds and (b) mean running time of non-timed out instances. Each combination of μ\mu and π\pi has 10 instances, 5 each for kk-Borda and β\beta-CC rule.
Refer to caption
(a) timed out instances
Refer to caption
(b) mean running time
Figure 4: Using SynData 1, (a) Proportion (in %) of instances that timed out at 2000 seconds and (b) mean running time of non-timed out instances. Each combination of μ\mu and π\pi has 5 instances for Monroe Rule.

8.3 Results

We now present results of empirical analyses of the efficiency of the heuristic algorithm and of the feasibility of DiRe committees and the cost of fairness.

8.3.1 Efficiency of Heuristic Algorithm

All experiments in this section combine instances of kk-Borda and β\beta-CC as there was no pairwise significant difference in the running time between the sets of instances of these two scoring rules (Student’s t-test, p>p> 0.05). We present the result for Monroe rule separately.

Algorithm is efficient:

Our heuristic-based algorithm is efficient on tested data sets (Figure 3, Figure 4, and Figure 5). Only 18.90% of 525 instances timed out at 2000 sec. Among the instances that did not time out, the average running time was 566.48 sec (standard deviation (sd) = 466.66) for kk-Borda and β\beta-CC, and 724.39 sec (sd = 575.31) for Monroe. In contrast, 93.71% of 525 instances timed out at 2000 sec when using brute-force algorithm.

Promisingly, using DiReGraph made the algorithm more efficient on instances that were sparsely connected as the average running time for all μ\mu when π\pi \leq 22 was 281.47 sec (sd = 208.65) for kk-Borda and β\beta-CC, and 358.87 sec (sd = 265.82) for Monroe. Higher π\pi led to denser DiReGraphs.

Performance when compared to ILP:

The real-world application of an ILP-based algorithm is very limited when using Chamberlin-Courant and Monroe rules [39]. More specifically, some instances of the ILP-algorithm that implemented the Monroe rule for kk=9, mm=30, and nn=100 timed out after one hour. The running time increased exponentially with increase in the number of voters as all instances of the ILP-algorithm that implemented the Monroe rule for kk=9, mm=30, and nn=200 did not terminate even after one day [39]. Hence, our algorithm, which (i) handles constraints and any committee selection rule and (ii) terminated in (avg) 724 sec, has a clear edge. Promisingly, the first committee returned by the algorithm (in << 120 sec) was the winning DiRe committee among 63% of all instances. Moreover, our algorithm scales linearly with an increase in the number of voters.

Efficiency and cohesiveness:

Our algorithm was the most efficient when the voters were either less cohesive (ϕ\phi\leq 0.3) or more cohesive (ϕ\phi\geq 0.8) (Figure 5). Among these two efficient sets of instances, the time taken (mean = 105.40 sec (sd = 4.16) for kk-Borda and β\beta-CC, and mean = 141.06 sec (sd = 8.08) for Monroe) by the preprocessing stage to return infeasibility for low ϕ\phi was less and the time taken (mean = 156.80 sec (sd = 2.86) for kk-Borda and β\beta-CC, and mean = 203.98 sec (sd = 12.66) for Monroe) by the heuristic-based search stage to return a DiRe committee for higher ϕ\phi was less. This shows the efficiency of our algorithm in opposing scenarios: the preprocessing step was efficient when ϕ\phi was low as it was easy to find a pair of constraints that are pairwise infeasible, and the heuristic-based backtracking was efficient when ϕ\phi was high as it was easy to find a DiRe committee.

Refer to caption
(a) kk-Borda and β\beta-CC
Refer to caption
(b) Monroe
Figure 5: Using SynData 2, proportion (in %) of instances that timed out at 2000 seconds and mean running time of non-timed out instances. (a) Each ϕ\phi has 10 instances (5 for kk-Borda and 5 for β\beta-CC). (b) Each ϕ\phi has 5 instances for Monroe rule.
Refer to caption
(a) kk-Borda
Refer to caption
(b) β\beta-CC
Refer to caption
(c) Monroe
Figure 6: Using SynData 2, proportion (in %) of instances that have an infeasible committee and the maximum proportion (in %) of constraints that are unsatisfiable per instance; each ϕ\phi has 5 instances.

8.3.2 Feasibility and Cost of Fairness

All experiments in this section consider instances of kk-Borda and β\beta-CC separately as there was a difference in the unsatisfiability between the two (Student’s t-test, p<p< 0.05). We continue to an analyze Monroe separately.

Higher number of attributes result in infeasible committee:

Figure 7 shows the proportion of feasible instances for each combination of μ\mu and π\pi. As the number of attributes increases, the proportion of feasible instances decreases. However, Figure 8 shows that the mean proportion of constraints satisfied for each instance is 90%\geq 90\% (sd \in [0, 5]). Hence, from the computational perspective, these results show the real-world utility of breaking down (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD problem into two-steps: (i) (μ\mu, π\pi)-DRCF problem solved using our algorithm followed by (ii) utility maximization problem. As we expect a constant number of committees to be feasible in real-world, we can overcome the intractability of using submodular scoring function, notwithstanding the worst case when all committees are feasible.

On the other hand, more promisingly: (i) When the sum of all the constraints was less than (μk\mu\cdot k), then, indeed, a feasible committee did exist on 85% of instances. (ii) More specifically, when the sum of the constraints was less than kk for all groups under each candidate attribute individually, then, indeed, a feasible committee did exist on all but one instance.

Infeasibility and unsatisfiability is dependent on cohesiveness:

There was a negative correlation between the maximum proportion of unsatisfied constraints and ϕ\phi, for all the three scoring rules (mean Pearson’s ρ\rho = -0.95, p<p<0.05) It was to easier to satisfy the constraints when the cohesiveness (ϕ\phi) was high, which led to lower infeasibility for higher ϕ\phi (Figure 6).

Note that the correlation is stated keeping the candidate groups and voter populations constant. Only the preferences vary and hence, do the winning committee WPW_{P} for each population P𝒫P\in\mathcal{P}. This is to say that higher cohesiveness of voters leads to higher cohesiveness among all WPW_{P}s and in turn easier to satisfy the constraints and in turn higher proportion of feasible committees.

β\beta-CC and Monroe satisfies higher proportion of representation constraints:

β\beta-CC and Monroe rules are better at satisfying representation constraints as compared to kk-Borda (Figure 7) as they are designed to maximize the voter representation, and in turn, the population satisfaction. However, we note that even when we use a committee selection rule that guarantees proportional representation, our analysis found that it was indeed the smaller population whose the representation constraints were violated disproportionately more than that of the larger population. Hence, the price of diversity was paid more by smaller population as compared to larger population, which quantitatively reaffirms the need for DiRe committees.

Refer to caption
(a) kk-Borda
Refer to caption
(b) β\beta-CC
Refer to caption
(c) Monroe
Figure 7: Using SynData 1, Proportion (in %) of instances that have an infeasible committee. Each combination of μ\mu (# candidate attributes) and π\pi (# voter attributes) has 5 instances.
Refer to caption
(a) kk-Borda
Refer to caption
(b) β\beta-CC
Refer to caption
(c) Monroe
Figure 8: Using SynData 1, the mean of maximum proportion (in %) of constraints that are unsatisfiable per instance; the maximum proportion being 0% if all constraints are satisfied, 100% if no constraint is satisfied and so on. Each combination of μ\mu (# candidate attributes) and π\pi (# voter attributes) has 5 instances.
Easier to satisfy constraints when k/(|𝒢|+|𝒫|)\nicefrac{{k}}{{(|\mathcal{G}|+|\mathcal{P}|)}} ratio is higher:

The proportion of constraints that are satisfied per instance changed from 100% to 49% (mean=82%, sd=12%) as the ratio changed from 1.00 to 0.25. This analysis basically captures the committee size to number of constraints ratio. Overall, it is easier to have a feasible instance with a larger the committee size (kk) or a smaller the number of constraints (|𝒢|+|𝒫||\mathcal{G}|+|\mathcal{P}|).

Loss in utility is proportional to the number of attributes:

Among all feasible instances, the mean ratio of utilities of constrained to unconstrained committee ranged from 0.99 (sd=0.01; for μ\mu=1, π\pi=0) to 0.43 (sd=0.22; for μ\mu=2, π\pi=1, the highest number of attributes with feasible committee) for kk-Borda, from 1.00 (sd=0.01; for μ\mu=0, π\pi=1) to 0.49 (sd=0.18; for μ\mu=1, π\pi=2) for β\beta-CC, and from 1.00 (sd=0.01; for μ\mu=0, π\pi=1) to 0.48 (sd=0.15; for μ\mu=2, π\pi=2) for the Monroe rule.

Higher group size to lower constraint ratios are easier to satisfy:

An important step of our heuristic algorithm was the use of the “minimum-remaining-value” heuristic, which helped in the selection of unsatisfied variable. We quantified the need for this heuristic by systematically varying the ratio of group (and population) size to lower constraint (equivalent to |Di|/Si\nicefrac{{|D_{i}|}}{{S_{i}}}), and found that the utility ratio is the highest when the average of the said ratio across all groups and population is the highest. Also, feasibility of an instance increases with an increase in this ratio. Hence, our heuristic, which prioritizes the lower ratio is efficient as it makes sense to first satisfy the groups or populations that are the hardest to satisfy.

8.3.3 Real Datasets

For each dataset, we implemented our model using 3 sets of constraints: constraint 1 only, constraint 2 only, and constraints 1 & 2. For Eurovision, these were at least one from each “region”, at least one from each “language”, and both combined. The ratios of utilities of constrained to unconstrained committees were 0.97, 0.88, and 0.82, respectively. For the UN resolutions, the constraints were at least two from each “topic”, at least six from “significant vote”, and both combined. The ratio of utilities was 0.99 for each of the individual constraints. No feasible committee was found when the constraints were combined. Importantly, our algorithm always terminated in under 102 sec across all instances.

9 Conclusion and Future Work

Conclusion:

There is an understanding in social sciences that organizations that answer the call for diversity to avoid legal troubles or to avoid being labeled as “racists” may actually create animosity towards racial minorities due to their imposing nature [72, 73, 74]. Similarly, when voters feel that diversity is mandatory and if it comes at the cost of their representation, it can do more harm than good. Hence, it is important to consider all actors of an election, namely candidates and voters, when designing fair algorithms. Doing so in this paper, we first motivated the need for diversity and representation constraints in multiwinner elections, and developed a model, (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD, which gives DiRe committees, is also needed because the call for diversity is becoming ubiquitous. However, in the context of elections, only diversity can do more harm than good as the price of diversity may disproportionately be paid more by historically disadvantaged population. Finally, we show the importance to delineate the candidate and voter attributes as we observed that diversity does not imply representation and vice versa, which contrasts the common understanding, and hence, requires further investigation. This is to say that having a female candidate on the committee is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female. These two are separate but equally important aims that need to be achieved simultaneously.

We note that (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD can satisfy many properties of multiwinner voting rules (e.g., monotonicity) [4] and it can be used as a common framework to solve other problems. As our model was computationally hard (Tables 1 and 2), we developed a heuristic-based algorithm, which was efficient on tested datasets. Finally, we did an empirical analyses of feasibility, utility traded-off, and efficiency.

Future Work:

It remains open to determine how the diversity and representation constraints are set to have a “fair” outcome. The way these constraints are set can lead to unfairness and hence, newer approaches are needed to ensure fairer outcomes. For instance, existing methods that guarantee representation fail when voters are divided into predefined population over one or more attributes. The apportionment method is one way to set the representation constraints, however, it does not account for the cohesiveness of the preferences within a population. Furthermore, this work can also give mathematical guarantees about the existence of DiRe committees. Additionally, just like correlation does not imply causation, diversity does not imply representation and vice versa. Hence, a mathematical framework is needed that can answer the following question: when does diversity imply representation, or when does representation imply diversity? The implications of such a formal framework range from hiring to clinical trials.

Another open question is determining what candidates are used to satisfy the constraints. Assuming that the given constraints are acceptable by everyone, the candidates chosen to satisfy the constraints can lead to unfair outcomes, especially for historically disadvantaged groups. For example, consider a kk-sized committee election (k=4k=4). The accepted constraints are that the committee should have two male and two female candidates. Next, consider two cases: (Case 1) top two scoring male candidates are selected in the committee and the bottom two scoring female candidates are selected and (Case 2) the top and the bottom scoring male candidates are selected in the committee and the top and bottom scoring female candidates are selected. While both the cases satisfy the given constraints, Case 1 is unfair for female candidates as their top-scoring candidate does not get a seat on the committee while male candidates do get their top two scoring candidates in the committee. This inequality is distributed in Case 2 and it seems naturally “fairer”.

Next open question pertains to the classification of the complexity of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD w.r.t. the committee selection rule 𝚏\mathtt{f}. In this paper, 𝚏\mathtt{f} could take only two values: it can either be a monotone, submodular but not separable function or a monotone, separable function. We established that determining the winning committee using the former is NP-hard, which was done via establishing hardness of using Chamberlin-Courant rule that uses a positional scoring rule whose first two values of the scoring vector are same (Theorem 6). However, the classification of complexity of determining the winning committee using Chamberlin-Courant rule, Monroe rule, and other submodular but not separable scoring functions w.r.t. different families of positional scoring rules remains open.

Continuing on the mathematical front, another future direction pertains to the relaxations made to group the candidates and the voters for Corollaries 2 and 3. More specifically, we showed the hardness persists even when a candidate attribute groups all candidates into one group and a voter attribute groups all voters into one population. These are unrealistic instances as real-world stipulation will require that each candidate attribute partitions the candidates into two or more groups and each voter attribute partitions the voters into two or more populations. Mathematically, Corollaries 2 and 3 will not hold under this stipulation and new proofs are required such that they conform to the stated stipulations.

Finally, the restrictions on voter preferences is another direction for future research. The reduction used to prove Theorem 5 shows that finding a committee using our model is NP-hard even when each population has one voter. This can be generalized to say that the hardness persists even when each population is completely cohesive within itself, which is a very natural assumption to make. For example, all male voters may have the same preferences and all female voters may have the same preferences and one population’s preferences are different from the other’s. However, given a set of constraints, one can explore as to how does the cohesiveness of voter preferences across populations affect the complexity of our model? In addition, it remains open whether the structure of voter preferences across population affects the complexity? Finally, the winning committees of populations can also be cohesive or structured, independent of the structure and cohesiveness of the voter preferences. Hence, even when voters’ preferences are not cohesive or structured, the cohesiveness or structure among the population’s winning committees may make our model tractable. An immediate consequence of a positive result on this front may be the narrowing of the gap between the intractability of finding a proportionally representative committee (e.g., using Chamberlin-Courant rule) and its tractable instance due to structured preferences. If finding a winning committee is tractable under the assumption of cohesiveness and/or structure of winning committees of the voter populations, then there is hope that finding a proportionally representative committee may also be tractable even under a weaker assumption on the structure of the voter preferences. On the other hand, if we know that the preferences of the voters is cohesive by a factor of ϕ\phi : 0ϕ10\leq\phi\leq 1, then we conjecture that there exists a polynomial time approximation algorithm for the (μ\mu, π\pi)-DRCF problem (μ=0\mu=0 and π1\pi\geq 1) with approximation ratio at most k(1o(1))k(k1)lnlng(ϕ)lng(ϕ)k-(1-o(1))\frac{k(k-1)\ln\ln g(\phi)}{\ln g(\phi)} where g(ϕ)g(\phi) is a function that maps the cohesiveness of the preferences ϕ\phi to the maximum number of winning committees WPW_{P} of each population that a candidate can belong to. This approximation ratio improves on the general inapproximability ratio of kk (Theorem 9) for (μ\mu, π\pi)-DRCF when ϕ\phi is not known.

Acknowledgement

I am grateful to Julia Stoyanovich for her insights that made me think about DiRe committees. I am thankful to Phokion G. Kolaitis for many helpful discussions that led to the classification of complexity of (μ\mu, π\pi, 𝚏\mathtt{f})-DRCWD. I acknowledge the efforts of high-school students Raisa Bhuiyan and Rachel Rose in collecting the real-world datasets, and of Théo Delemazure for comments on empirical analysis. Finally, I thank anonymous reviewers for their comments on an earlier version of this paper.

References

  • [1] Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, and Nimrod Talmon. Committee scoring rules: Axiomatic classification and hierarchy. In IJCAI, pages 250–256, 2016.
  • [2] D Marc Kilgour. Approval balloting for multi-winner elections. In Handbook on approval voting, pages 105–124. Springer, 2010.
  • [3] Luis Sánchez-Fernández, Edith Elkind, Martin Lackner, Norberto Fernández, Jesús A Fisteus, Pablo Basanta Val, and Piotr Skowron. Proportional justified representation. In AAAI, 2017.
  • [4] Edith Elkind, Piotr Faliszewski, Piotr Skowron, and Arkadii Slinko. Properties of multiwinner voting rules. Social Choice and Welfare, 48(3):599–632, 2017.
  • [5] Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, and Nimrod Talmon. Multiwinner voting: A new challenge for social choice theory. Trends in computational social choice, 74:27–47, 2017.
  • [6] Haris Aziz, Markus Brill, Vincent Conitzer, Edith Elkind, Rupert Freeman, and Toby Walsh. Justified representation in approval-based committee voting. Social Choice and Welfare, 48(2):461–485, 2017.
  • [7] Martin Lackner and Piotr Skowron. Consistent approval-based multi-winner rules. Journal of Economic Theory, 192:105173, 2021.
  • [8] Robert Bredereck, Piotr Faliszewski, Ayumi Igarashi, Martin Lackner, and Piotr Skowron. Multiwinner elections with diversity constraints. In AAAI, 2018.
  • [9] L Elisa Celis, Lingxiao Huang, and Nisheeth K Vishnoi. Multiwinner voting with fairness constraints. In IJCAI, 2018.
  • [10] Markus Brill, Jean-François Laslier, and Piotr Skowron. Multiwinner approval rules as apportionment methods. Journal of Theoretical Politics, 30(3):358–382, 2018.
  • [11] Burt L Monroe. Fully proportional representation. American Political Science Review, 89(4):925–940, 1995.
  • [12] Yukio Koriyama, Antonin Macé, Rafael Treibich, and Jean-François Laslier. Optimal apportionment. Journal of Political Economy, 121(3):584–608, 2013.
  • [13] Jérôme Lang and Piotr Skowron. Multi-attribute proportional representation. Artificial Intelligence, 263:74–106, 2018.
  • [14] Ricardo Baeza-Yates. Data and algorithmic bias in the web. In Proceedings of the 8th ACM Conference on Web Science, pages 1–1. ACM, 2016.
  • [15] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018.
  • [16] L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. Ranking with fairness constraints. 45th International Colloquium on Automata, Languages, and Programming, ICALP, 2018.
  • [17] David Danks and Alex John London. Algorithmic bias in autonomous systems. In IJCAI, pages 4691–4697, 2017.
  • [18] Sara Hajian, Francesco Bonchi, and Carlos Castillo. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 2125–2126. ACM, 2016.
  • [19] Anja Lambrecht and Catherine Tucker. Algorithmic bias? an empirical study of apparent gender-based discrimination in the display of stem career ads. Management Science, 2019.
  • [20] Julia Stoyanovich, Ke Yang, and HV Jagadish. Online set selection with fairness and diversity constraints. In Proceedings of the EDBT Conference, 2018.
  • [21] Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, page 22. ACM, 2017.
  • [22] Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In IJCAI, 2019.
  • [23] Caitlin Kuhlman and Elke Rundensteiner. Rank aggregation algorithms for fair consensus. Proceedings of the VLDB Endowment, 13(11):2706–2719, 2020.
  • [24] Xiaohui Bei, Shengxin Liu, Chung Keung Poon, and Hongao Wang. Candidate selections with proportional fairness constraints. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 150–158, 2020.
  • [25] Deval Patel, Arindam Khan, and Anand Louis. Group fairness for knapsack problems. arXiv preprint arXiv:2006.07832, 2020.
  • [26] Till Fluschnik, Piotr Skowron, Mervin Triphaus, and Kai Wilker. Fair knapsack. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1941–1948, 2019.
  • [27] D Ellis Hershkowitz, Anson Kahng, Dominik Peters, and Ariel D Procaccia. District-fair participatory budgeting. Proceedings of AAAI’21, 2021.
  • [28] Dominik Peters, Grzegorz Pierczyński, and Piotr Skowron. Proportional participatory budgeting with cardinal utilities. arXiv preprint arXiv:2008.13276, 2020.
  • [29] Martin Lackner, Jan Maly, and Simon Rey. Fairness in long-term participatory budgeting. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1566–1568, 2021.
  • [30] Gourab K Patro, Arpita Biswas, Niloy Ganguly, Krishna P Gummadi, and Abhijnan Chakraborty. Fairrec: Two-sided fairness for personalized recommendations in two-sided platforms. In Proceedings of The Web Conference 2020, pages 1194–1204, 2020.
  • [31] Abhijnan Chakraborty, Aniko Hannak, Asia J Biega, and Krishna Gummadi. Fair sharing for sharing economy platforms. In Fairness, Accountability and Transparency in Recommender Systems-Workshop on Responsible Recommendation, 2017.
  • [32] Tom Sühr, Asia J Biega, Meike Zehlike, Krishna P Gummadi, and Abhijnan Chakraborty. Two-sided fairness for repeated matchings in two-sided markets: A case study of a ride-hailing platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3082–3092, 2019.
  • [33] Marc Rysman. The economics of two-sided markets. Journal of economic perspectives, 23(3):125–43, 2009.
  • [34] Himan Abdollahpouri and Robin Burke. Multi-stakeholder recommendation and its connection to multi-sided fairness. arXiv preprint arXiv:1907.13158, 2019.
  • [35] Robin Burke. Multisided fairness for recommendation. arXiv preprint arXiv:1707.00093, 2017.
  • [36] Haris Aziz. Two-sided matching with diversity concerns: an annotated reading list. ACM SIGecom Exchanges, 19(1):15–17, 2021.
  • [37] John R Chamberlin and Paul N Courant. Representative deliberations and representative decisions: Proportional representation and the borda rule. American Political Science Review, 77(3):718–733, 1983.
  • [38] Ariel D Procaccia, Jeffrey S Rosenschein, and Aviv Zohar. On the complexity of achieving proportional representation. Social Choice and Welfare, 30(3):353–362, 2008.
  • [39] Piotr Skowron, Piotr Faliszewski, and Arkadii Slinko. Achieving fully proportional representation: Approximability results. Artificial Intelligence, 222:67–103, 2015.
  • [40] Tyler Lu and Craig Boutilier. Budgeted social choice: From consensus to personalized decision making. In IJCAI, 2011.
  • [41] Yongjie Yang and Jianxin Wang. Parameterized complexity of multi-winner determination: More effort towards fixed-parameter tractability. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2142–2144, 2018.
  • [42] Nadja Betzler, Arkadii Slinko, and Johannes Uhlmann. On the computation of fully proportional representation. JAIR, 47:475–519, 2013.
  • [43] Chinmay Sonar, Palash Dey, and Neeldhara Misra. On the complexity of winner verification and candidate winner for multiwinner voting rules. arXiv preprint arXiv:2004.13933, 2020.
  • [44] Edith Elkind and Martin Lackner. Structure in dichotomous preferences. In IJCAI, 2015.
  • [45] Dominik Peters and Martin Lackner. Preferences single-peaked on a circle. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • [46] Jonas Israel and Markus Brill. Dynamic proportional rankings. arXiv preprint arXiv:2105.08043, 2021.
  • [47] Joel Uckelman, Yann Chevaleyre, Ulle Endriss, and Jérôme Lang. Representing utility functions via weighted goals. Mathematical Logic Quarterly, 55(4):341–361, 2009.
  • [48] Steven J Brams. Constrained approval voting: A voting system to elect a governing board. Interfaces, 20(5):67–80, 1990.
  • [49] R Potthoff. Use of linear programming for constrained approval voting. Interfaces, 20(5):79–80, 1990.
  • [50] Jonathan K Hodge and Richard E Klima. The mathematics of voting and elections: a hands-on approach, volume 30. American Mathematical Soc., 2018.
  • [51] Vishal Chakraborty and Phokion G Kolaitis. Classifying the complexity of the possible winner problem on partial chains. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 297–305, 2021.
  • [52] Michael R Garey and David S Johnson. Computers and intractability, volume 174. Freeman San Francisco, 1979.
  • [53] Paola Alimonti and Viggo Kann. Hardness of approximating problems on cubic graphs. In Italian Conference on Algorithms and Complexity, pages 288–298. Springer, 1997.
  • [54] Subhash Khot. On the power of unique 2-prover 1-round games. In STOC, pages 767–775, 2002.
  • [55] Khot Subhash, Dor Minzer, and Muli Safra. Pseudorandom sets in grassmann graph have near-perfect expansion. In FOCS, pages 592–601. IEEE, 2018.
  • [56] Irit Dinur, Subhash Khot, Guy Kindler, Dor Minzer, and Muli Safra. Towards a proof of the 2-to-1 games conjecture? In STOC, pages 376–389, 2018.
  • [57] Irit Dinur, Venkatesan Guruswami, Subhash Khot, and Oded Regev. A new multilayered pcp and the hardness of hypergraph vertex cover. SIAM Journal on Computing, 34(5):1129–1146, 2005.
  • [58] Subhash Khot, Dor Minzer, and Muli Safra. On independent sets, 2-to-2 games, and grassmann graphs. In STOC, pages 576–589, 2017.
  • [59] Luca Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 453–461, 2001.
  • [60] Rolf Niedermeier and Peter Rossmanith. An efficient fixed-parameter algorithm for 3-hitting set. Journal of Discrete Algorithms, 1(1):89–102, 2003.
  • [61] Irit Dinur and David Steurer. Analytical approach to parallel repetition. In STOC, pages 624–633, 2014.
  • [62] Nikhil Bansal and Subhash Khot. Inapproximability of hypergraph vertex cover and applications to scheduling problems. In International Colloquium on Automata, Languages, and Programming, pages 250–261. Springer, 2010.
  • [63] Eran Halperin. Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs. SIAM Journal on Computing, 31(5):1608–1623, 2002.
  • [64] Rodney G Downey and Michael Ralph Fellows. Parameterized complexity. Springer Science & Business Media, 2012.
  • [65] Giorgio Ausiello, Alessandro D’Atri, and Marco Protasi. Structure preserving reductions among convex optimization problems. Journal of Computer and System Sciences, 21(1):136–153, 1980.
  • [66] Pierluigi Crescenzi, Viggo Kann, and M Halldórsson. A compendium of np optimization problems, 1995.
  • [67] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Pearson Education Limited, 2002.
  • [68] Kaggle. Eurovision Song Contest 1975-2019, (accessed July 22, 2020), 2019.
  • [69] Erik Voeten. UN General Assembly Votes, 1946-2015, (accessed July 22, 2020), 2014.
  • [70] C. L. Mallows. Non-null ranking models. i. Biometrika, 44(1-2):114–130, June 1957.
  • [71] Vishal Chakraborty, Theo Delemazure, Benny Kimelfeld, Phokion G Kolaitis, Kunal Relia, and Julia Stoyanovich. Algorithmic techniques for necessary and possible winners. ACM/IMS Transactions on Data Science, 2(3):1–23, 2021.
  • [72] Frank Dobbin and Alexandra Kalev. Diversity why diversity programs fail and what works better. Harvard Business Review, 94(7-8):52–60, 2016.
  • [73] Eduardo Bonilla-Silva. Racism without racists: Color-blind racism and the persistence of racial inequality in the United States. Rowman & Littlefield Publishers, 2006.
  • [74] Victor Ray. A theory of racialized organizations. American Sociological Review, 84(1):26–53, 2019.