11email: elkind@cs.ox.ac.uk 22institutetext: AGH University of Science and Technology, Poland
22email: faliszew@agh.edu.pl 33institutetext: National Institute of Informatics, Japan
33email: ayumi_igarashi@nii.ac.jp 44institutetext: Google Research, USA
44email: pasin@google.com 55institutetext: TU Berlin, Germany
55email: u.schmidt-kraepelin@tu-berlin.de 66institutetext: National University of Singapore, Singapore
66email: warut@comp.nus.edu.sg
Justifying Groups in
Multiwinner Approval Voting
Abstract
Justified representation (JR) is a standard notion of representation in multiwinner approval voting. Not only does a JR committee always exist, but previous work has also shown through experiments that the JR condition can typically be fulfilled by groups of fewer than candidates, where is the target size of the committee. In this paper, we study such groups—known as -justifying groups—both theoretically and empirically. First, we show that under the impartial culture model, -justifying groups of size less than are likely to exist, which implies that the number of JR committees is usually large. We then present efficient approximation algorithms that compute a small -justifying group for any given instance, and a polynomial-time exact algorithm when the instance admits a tree representation. In addition, we demonstrate that small -justifying groups can often be useful for obtaining a gender-balanced JR committee even though the problem is NP-hard.
Keywords:
Justified representation Multiwinner voting Computational social choice.1 Introduction
Country X needs to select a set of singers to represent it in an international song festival. Not surprisingly, each member of the selection board has preferences over the singers, depending possibly on the singers’ ability and style or on the type of songs that they perform. How should the board aggregate the preferences of its members and decide on the group of singers to invite for the festival?
The problem of choosing a set of candidates based on the preferences of voters—be it singers for a song festival selected by the festival’s board, researchers selected by the conference’s program committee to give full talks, or places to include on the list of world heritage sites based on votes by Internet users—is formally studied under the name of multiwinner voting [10]. In many applications, the voters’ preferences are expressed in the form of approval ballots, wherein each voter either approves or disapproves each candidate; this is a simple yet expressive form of preference elicitation [4, 14]. When selecting a committee, an important consideration is that this committee adequately represents groups of voters who share similar preferences. A natural notion of representation, which was proposed by Aziz et al. [3] and has received significant interest since then, is justified representation (JR). Specifically, if there are voters and the goal is to select candidates, a committee is said to satisfy JR if for any group of at least voters all of whom approve a common candidate, at least one of these voters approves some candidate in the committee.
A committee satisfying JR always exists for any voter preferences, and can be found by several voting procedures [3]. In fact, Bredereck et al. [6] observed experimentally that when the preferences are generated according to a range of stochastic distributions, the number of JR committees is usually very high. This observation led them to introduce the notion of an -justifying group, which is a group of candidates that already fulfills the JR requirement even though its size may be smaller than . Bredereck et al. found that, in their experiments, small -justifying groups (containing fewer than candidates) typically exist. This finding helps explain why there are often numerous JR committees—indeed, to obtain a JR committee, one can start with a small -justifying group and then extend it with arbitrary candidates.
The goal of our work is to conduct an extensive study of -justifying groups, primarily from a theoretical perspective but also through experiments. Additionally, we demonstrate that small -justifying groups can be useful for obtaining JR committees with other desirable properties such as gender balance.
1.1 Our Contribution
In Section 3, we present results on -justifying groups and JR committees for general instances. When the voters’ preferences are drawn according to the standard impartial culture (IC) model, in which each voter approves each candidate independently with probability , we establish a sharp threshold on the group size: above this threshold, all groups are likely to be -justifying, while below the threshold, no group is likely to be. In particular, the threshold is below for every value of , thereby providing a theoretical explanation of Bredereck et al.’s findings [6]. Our result also implies that with high probability, the number of JR committees is very large, which means that the JR condition is not as stringent as it may seem. On the other hand, we show that, in the worst case, there may be very few JR committees: their number can be as small as (where denotes the number of candidates), and this is tight.
Next, in Section 4, we focus on the problem of computing a small -justifying group for a given instance. While this problem is NP-hard to approximate to within a factor of even in the case (since it is equivalent to the well-known Set Cover problem in that case111In Appendix 0.A, we extend this hardness to the case .), we show that the simple GreedyCC algorithm [15, 17] returns an -justifying group whose size is at most times the optimal size; moreover, this factor is asymptotically tight. We then devise a new greedy algorithm, GreedyCandidate, with approximation ratio . There are several applications of multiwinner voting where the number of candidates is either smaller or not much larger than the number of voters ; for such applications, the approximation ratio of GreedyCandidate is much better than that of GreedyCC. Further, we show that if the voters’ preferences admit a tree representation, an optimal solution can be found in polynomial time. The tree representation condition is known to encompass several other preference restrictions [19]; interestingly, we show that it also generalizes a recently introduced class of restrictions called 1D-VCR [12].
While small -justifying groups are interesting in their own right given that they offer a high degree of representation relative to their size, an important benefit of finding such a group is that one can complement it with other candidates to obtain a JR committee with properties that one desires—the smaller the group, the more freedom one has in choosing the remaining members of the committee. We illustrate this with a common consideration in committee selection: gender balance.222Bredereck et al. [5] studied maximizing objective functions of committees subject to gender balance and other diversity constraints, but did not consider JR. In Section 5, we show that although it is easy to find a JR committee with at least one member of each gender, computing or even approximating the smallest gender imbalance subject to JR is NP-hard. Nevertheless, in Section 6, we demonstrate through experiments that both GreedyCC and GreedyCandidate usually find an -justifying group of size less than ; by extending such a group, we obtain a gender-balanced JR committee in polynomial time. In addition, we experimentally verify our result from Section 3 in the IC model, and perform analogous experiments in two Euclidean models.
2 Preliminaries
There is a finite set of candidates and a finite set of voters , where we write for any positive integer . Each voter submits a non-empty ballot , and the goal is to select a committee, which is a subset of of size . Thus, an instance of our problem can be described by a set of candidates , a list of ballots , and a positive integer ; we write .
We are interested in representing the voters according to their ballots. Given an instance with , we say that a group of voters is cohesive if . Further, we say that a committee represents a group of voters if for some . If candidate is approved by voter , we say that covers . We are now ready to state the justified representation axiom of Aziz et al. [3].
Definition 1 (JR)
Given an instance with , we say that a committee of size provides justified representation (JR) for if it represents every cohesive group of voters such that . We refer to such a committee as a JR committee.
More generally, we can extend the JR condition to groups of fewer than candidates (the requirement that this group represents every cohesive group of at least voters is with respect to the original parameter ). Bredereck et al. [6] called such a group of candidates an -justifying group.
A simple yet important algorithm in this setting is GreedyCC [15, 17]. We consider a slight modification of this algorithm. Our algorithm starts with the empty committee and iteratively adds one candidate at a time. At each step, if there is still an unrepresented cohesive group of size at least , the algorithm identifies a largest such group and adds a common approved candidate of the group to the committee. If no such group exists, the algorithm returns the current set of candidates, which is -justifying by definition. It is not hard to verify that (our version of) GreedyCC runs in polynomial time and outputs an -justifying group of size at most . Sometimes we may let the algorithm continue by identifying a largest unrepresented cohesive group (of size smaller than ) and adding a common approved candidate of the group.
3 General Guarantees
In order to be -justifying, a group may need to include candidates in the worst case: this happens, e.g., when is divisible by , the first candidates are approved by disjoint sets of voters each, and the remaining candidates are only approved by one voter each. However, many instances admit much smaller -justifying groups. Indeed, in the extreme case, if there is no cohesive group of voters, the empty group already suffices. It is therefore interesting to ask what happens in the average case. We focus on the well-studied impartial culture (IC) model, in which each voter approves each candidate independently with probability . If , the empty group is already -justifying, while if , any singleton group is sufficient. For each , we establish a sharp threshold on the group size: above this threshold, all groups are likely to be -justifying, while below the threshold, it is unlikely that any group is -justifying.
Theorem 3.1
Suppose that and are fixed, and let be a real constant and an integer constant. Assume that the votes are distributed according to the IC model with parameter .
-
(a)
If , then with high probability as , every group of candidates is -justifying.
-
(b)
If , then with high probability as , no group of candidates is -justifying.
Here, “with high probability” means that the probability converges to as . To prove this result, we will make use of the following standard probabilistic bound.
Lemma 1 (Chernoff bound)
Let be independent random variables taking values in , and let . Then, for any ,
and
Proof (of Theorem 3.1)
(a) Let for some constant , and consider any group of size . We claim that for any candidate , with high probability as , the number of voters who approve but do not approve any of the candidates in is less than . Since is constant, once this claim is established, we can apply the union bound over all candidates outside to show that is likely to be -justifying. Then, we apply the union bound over all (constant number of) groups of size .
Fix a candidate . For each , let be an indicator random variable that indicates whether voter approves and none of the candidates in ; takes the value if so, and otherwise. Let . We have for each , and so . By Lemma 1, it follows that
where is constant. This probability converges to as , proving the claim.
(b) Let for some constant . First, suppose for contradiction that . The derivative of is , so attains its maximum at , where , a contradiction. Hence .
Consider any group of size . We claim that for any candidate (such a candidate exists because ), with high probability as , the number of voters who approve but do not approve any of the candidates in is greater than . When this is the case, is not -justifying. We then apply the union bound over all possible groups .
Fix a candidate , and define the random variables and as in part (a). We have for each , and so . By Lemma 1, it follows that
where is constant. This probability converges to as , proving the claim.
Theorem 3.1 implies that if , then the empty group is already -justifying with high probability, because there is unlikely to be a sufficiently large cohesive group of voters. On the other hand, when , the threshold for the required group size occurs when , i.e., . For , the maximum occurs at , where we have . This means that for every , an arbitrary group of size is likely to be -justifying. Interestingly, the threshold for never exceeds regardless of .
Proposition 1
Suppose that and are fixed, and let be a real constant and an integer constant. Assume that the votes are distributed according to the IC model with parameter . Then, with high probability as , every group of size is -justifying.
Proof
By part (a) of Theorem 3.1, it suffices to show that for all and integers . As in the analysis of part (b) of Theorem 3.1, the function attains its maximum at , where . By Bernoulli’s inequality, we have . It follows that
as desired.
We remark that the proposition would not hold if we were to replace by : indeed, for , the maximum occurs at , where we have .
An implication of Proposition 1 is that under the IC model, with high probability, every size- committee provides JR. This raises the question of whether the number of JR committees is large even in the worst case. The following example shows that the answer is negative: when is divisible by , the number of JR committees can be as small as .
Example 1
Assume that is divisible by . Consider an instance where
-
;
-
;
-
-
;
-
.
A JR committee must include ; for the last slot, any of the remaining candidates can be chosen. Hence, there are exactly JR committees.
We complement Example 1 by establishing that, as long as every candidate is approved by at least one voter, there are always at least JR committees.333The condition that every candidate is approved by at least one voter is necessary. Indeed, if the last approval set in Example 1 is changed from to , then there is only one JR committee: . This matches the upper bound in Example 1 and improves upon the bound of by Bredereck et al. [6, Thm. 3]. Moreover, the bound holds regardless of whether is divisible by .
Theorem 3.2
For every instance such that every candidate in is approved by some voter, at least committees of size provide JR.
Proof
We run GreedyCC for steps. If the resulting group (of size ) is already -justifying, we can choose any of the remaining candidates as the final member of the committee. Hence, assume that the group after steps is not -justifying. This means that each of the first candidates covers exactly voters (these sets of voters are disjoint), and the remaining voters are covered by another candidate. In particular, is divisible by . Call these blocks of voters , and assume without loss of generality that the corresponding candidates are , respectively. For each of the remaining candidates, the candidate is approved by at least one voter, say in block , so we can combine the candidate with to form a JR committee. This yields distinct JR committees. Finally, the committee also provides JR and differs from all of the above commitees. It follows that there are at least JR committees.
4 Instance-Specific Optimization
As we have seen in Section 3, several instances admit an -justifying group of size much smaller than the worst-case size . However, the problem of computing a minimum-size -justifying group is NP-hard to approximate to within a factor of even when (see Section 1.1). In this section, we address the question of how well we can approximate such a group in polynomial time.
4.1 GreedyCC
A natural approach to computing a small -justifying group is to simply run (our variant of) GreedyCC, stopping as soon as the current group is -justifying. However, as the following example shows, the output of this algorithm may be times larger than the optimal solution.
Example 2
Let and , for some . Consider an instance where
-
;
-
;
-
-
;
-
;
-
;
-
-
;
-
;
-
;
-
-
.
Since are each approved by pairwise disjoint groups of voters, while is approved by voters, GreedyCC outputs the group . However, the singleton group is already -justifying. The ratio between the sizes of the two groups is .
It turns out that Example 2 is already a worst-case scenario for GreedyCC, up to a constant factor.
Theorem 4.1
For every instance , GreedyCC outputs an -justifying group at most times larger than a smallest -justifying group.
Proof
Assume without loss of generality that is a smallest -justifying group; our goal is to show that GreedyCC selects at most candidates. For , we say that a candidate (possibly ) chosen by GreedyCC crosses if is approved by some voter who approves and does not approve any candidate chosen by GreedyCC up to the point when is selected. Note that each candidate selected by GreedyCC must cross some candidate in —indeed, if not, the cohesive group of at least voters that forces GreedyCC to select the candidate would not be represented by , contradicting the assumption that is -justifying.
Now, we claim that for each , the candidate can be crossed by at most candidates in the GreedyCC solution; this suffices for the desired conclusion. The claim is immediate if is approved by at most voters, because each candidate selected by GreedyCC that crosses must cover a new voter who approves . Assume therefore that is approved by more than voters, and suppose for contradiction that is crossed by more than candidates in the GreedyCC solution. Denote these candidates by in the order that GreedyCC selects them, where . Notice that for each , when GreedyCC selects , it favors over , which would cover at least uncovered voters (i.e., the “crossing points” of with ). Hence, itself must cover at least uncovered voters. Moreover, covers at least one uncovered voter. This means that the total number of voters is at least , a contradiction.
4.2 GreedyCandidate
Next, we present a different greedy algorithm, which provides an approximation ratio of . Note that this ratio is asymptotically better than the ratio of GreedyCC in the range ; several practical elections fall under this range, since the number of candidates is typically smaller or, at worst, not much larger than the number of voters (e.g., when Internet users vote upon world heritage site candidates or students elect student council members).
To understand our new algorithm, recall that GreedyCC can be viewed as a greedy covering algorithm, where the goal is to pick candidates to cover the voters. Our new algorithm instead views the problem as “covering” the candidates. Specifically, for a set of candidates to be an -justifying group, all but at most of the voters who approve each candidate in must be “covered” by . In other words, each candidate must be “covered” at least times, where denotes the set of voters who approve and we use the notation as a shorthand for . Our algorithm greedily picks in each step a candidate whose selection would minimize the corresponding potential function, , where denotes the set of voters who approve but do not approve any candidate selected by the algorithm thus far. The pseudocode of the algorithm, which we call GreedyCandidate, is presented as Algorithm 1. One can check that GreedyCandidate runs in polynomial time.
Theorem 4.2
For every instance , GreedyCandidate outputs an -justifying group that is at most times larger than a smallest -justifying group.
Proof
First, note that whenever for all , every unrepresented cohesive group has size at most , meaning that the output of our algorithm is indeed an -justifying group.
Next, let us bound the size of the output . Assume without loss of generality that is a smallest -justifying group. If , then the while-loop immediately terminates and the algorithm outputs . Thus, we may henceforth assume that .
For each , denote by the set after the -th iteration of the while-loop (so is simply the set of voters who approve ). Let denote the potential after the -th iteration (so is the potential at the beginning). We will show that this potential decreases by at least a factor of with each iteration; more formally,
(1) |
for each .
Before we prove (1), let us show how we can use it to bound . To this end, observe that when the potential is less than , the while-loop terminates. This means that . Furthermore, we have . Applying (1), we get
where for the last inequality we use the bound , which holds for any . Rearranging, we arrive at , as desired.
We now return to proving (1). Our assumption that is an -justifying group implies that for all . In each iteration, the algorithm replaces by for all , so we also have for all and . Fix any , let , and let be the candidate chosen in the -th iteration. From the definition of , we have
(2) |
Consider any . We claim that
(3) |
To see that (3) holds, consider the following two cases:
-
•
Case 1: for some . We may assume without loss of generality that . We have
-
•
Case 2: for all . This means that , and
where we use for the inequality.
Hence, in both cases, (3) holds. Plugging this back into (2), we get
This implies (1) and completes our proof.
Although we do not know whether an efficient -approximation algorithm exists, we show next that by combining Theorem 4.2 with a brute-force approach, we can arrive at a quasi-polynomial-time444Recall that a running time is said to be quasi-polynomial if it is of the form , where denotes the input size (in our case, ). algorithm that has an approximation ratio of for any constant .
Theorem 4.3
For any constant there exists an -time algorithm that, on input , outputs an -justifying group that is at most times larger than a smallest -justifying group.
Proof
If , then we may simply run GreedyCandidate, which runs in polynomial time and yields an approximation ratio of .
Otherwise, we iterate over all subsets . For each , we run GreedyCC on until all voters in are covered (i.e., we stop only when no voter in remains unrepresented); denote the resulting set of candidates by . Finally, among the sets, we output a smallest one that is -justifying with respect to . Notice that the running time of our algorithm is ; this follows from . To analyze the approximation guarantee, let denote a smallest -justifying group, and be the set of voters covered by , where denotes the set of voters who approve . When , we have that is an -justifying group and, from standard analyses of the greedy set cover algorithm,555See, for example, Chapter 2 in the book by Vazirani [18]. we have . In other words, the approximation ratio is at most , as claimed.
4.3 Tree Representation
Even though computing a smallest -justifying group is NP-hard even to approximate, we show in this section that this problem becomes tractable if the instance admits a tree representation. An instance is said to admit a tree representation if its candidates can be arranged on a tree in such a way that the approved candidates of each voter form a subtree of (i.e., the subgraph of induced by each approval set is connected). While the tree representation condition is somewhat restrictive, we remark that it is general enough to capture a number of other preference restrictions [19, Fig. 4]. In particular, we show in Appendix 0.B that it encompasses a recently introduced class called 1-dimensional voter/candidate range model (1D-VCR) [12].666Together with the results of Yang [19] and Godziszewski et al. [12], this means that the tree representation also captures the candidate interval (CI) and voter interval (VI) domains, the two most commonly studied restrictions for the approval setting, introduced by Elkind and Lackner [9].
Theorem 4.4
For every instance admitting a tree representation, a smallest -justifying group can be computed in polynomial time.
Proof
Let be a tree representation of , i.e., for every the set induces a subtree of . Root at an arbitrary node, and define the depth of a node in as its distance from the root node (so the root node itself has depth ). For each subtree of , denote by the set of its nodes, and for each node , denote by the subtree of rooted at (i.e., contains all nodes in whose path towards the root of passes ). The algorithm sets and proceeds as follows:
-
1.
Select a node of maximum depth such that there exists a set of voters with the following two properties:
-
(a)
for all ;
-
(b)
.
If no such node exists, delete all candidates from , delete the remaining tree , and return .
-
(a)
-
2.
Add to , remove all voters such that from , and delete from and from . Go back to Step 1.
Except for the last round, the algorithm adds one candidate to the set in each round, so it runs for rounds, where we slightly abuse notation and use to refer to the final output from now on. Each round can be implemented in polynomial time—indeed, for each node , we can consider the sets that are contained in and check whether some node in appears in at least of these sets.
We now establish the correctness of the algorithm. For each round , we define to be the set of candidates selected by the algorithm up to and including round , and to be the remaining tree after round , where round refers to the point before the execution of the algorithm (so and ). We also define to be the set of candidates deleted up to and including round . See Figure 1 for an illustration.
Claim
After each round ,
(i) there exists a smallest -justifying group of the original instance such that , and
(ii) for each , at least one of the following three relations holds: , , or .
Proof (of Claim)
We prove the Claim by induction on . For , we have , so for all , and both (i) and (ii) hold trivially. Now consider any and assume that the Claim holds for .
Case : . Let be the candidate selected in this round and be the corresponding set of voters in the algorithm. Then, and . Let be a smallest -justifying group with as guaranteed by the induction hypothesis. If , then statement (i) of the Claim follows by choosing the same set . Assume therefore that . If , then does not represent the cohesive group of voters of size , a contradiction. Hence, there exists a candidate ; let be the parent of (possibly ). See Figure 1.
We will show that is a smallest -justifying group. Since its size is at most the size of , if it is an -justifying group, then it is also a smallest one. Assume for contradiction that the group is not -justifying. This means there exists a group of voters of size such that: (1) the voters in approve a common candidate ; (2) at least one voter in approves ; and (3) none of the voters in approves any candidate in .
Observe that for a voter who approves , we know that by statement (ii) of the Claim for . Moreover, since does not approve . Combining the previous two sentences, we have . As a result, as well, and by the same arguments as for we get that for all . However, this is a contradiction to the choice of in the algorithm.
Applying this argument (as we did between and ) repeatedly until we reach , we obtain that there exists a smallest -justifying group with . This proves statement (i) of the Claim.
For (ii), we argue that for each voter , at least one of the relations , , and holds. Consider a voter for whom neither nor holds. If , then (and therefore ) follows from the induction hypothesis—indeed, since , it must be that . Hence, we can assume that and , which means that . Since is not a subset of , this implies that , and therefore . This establishes (ii).
Case : . We will show that is an -justifying group; assume for contradiction that it is not. By the induction hypothesis, there exists an -justifying group such that . In particular, there exists a group of voters of size such that all of them approve a common candidate, at least one of them approves a candidate in , and none of them approves any candidate in . Similarly to Case 1, we can argue that for all , it holds that . It follows that the root node of satisfies both conditions (a) and (b) in Step 1 of the algorithm, contradicting the fact that the algorithm terminated. Hence, is an -justifying group, and we can take for statement (i) of the Claim.
For (ii), it suffices to observe that , which means that for all .
As statement (i) of the claim holds in particular for , in which case , this concludes the proof of the theorem.
5 Gender Balance
In the next two sections, we demonstrate that small -justifying groups can be useful for obtaining JR committees with additional properties. For concreteness, we consider a common desideratum: gender balance. Indeed, in many candidate selection scenarios, it is preferable to have a balance with respect to the gender of the committee members. Formally, assume that each candidate in belongs to one of two types, male and female. For each committee , we define the gender imbalance of as the absolute value of the difference between the number of male candidates and the number of female candidates in . A committee is said to be gender-balanced if its gender imbalance is .
The following example shows that gender balance can be at odds with justified representation.
Example 3
Suppose that is even, each voter only approves a male candidate , while voter approves female candidates . Any JR committee must contain all of , and can therefore contain at most one female candidate. But there exists a (non-JR) gender-balanced committee .
Example 3 is as bad as it gets: under very mild conditions, there always exists a JR committee with at least one representative of each gender.
Theorem 5.1
For every instance such that for each gender, some candidate of that gender is approved by at least one voter, there exists a JR committee with at least one member of each gender. Moreover, such a committee can be computed in polynomial time.
Proof
As in the proof of Theorem 3.2, we run GreedyCC for steps, and consider two cases. If the resulting group of size is already -justifying, we can choose the last member to be of the missing gender if necessary. Else, we continue with the -th step, and assume that the obtained committee is, say, all-female. In this case, like in the proof of Theorem 3.2, must be divisible by , and there exist disjoint blocks of voters and candidates such that for each , all voters in block approve candidate . Take a male candidate who is approved by some voter, say, in block . Then, the committee consisting of this candidate together with provides JR and contains members of both genders. Since the construction of the blocks can be done in polynomial time, computing the committee also takes polynomial time.
In light of Theorem 5.1, it is natural to ask for a JR committee with the lowest gender imbalance. Unfortunately, our next result shows that deciding whether there exists a gender-balanced committee that provides JR, or even obtaining a close approximation thereof, is computationally hard.
Theorem 5.2
Even when , there exists a constant such that distinguishing between the following two cases is NP-hard:
-
•
(YES) There exists a gender-balanced JR committee;
-
•
(NO) Every JR committee has gender imbalance .
It follows from Theorem 5.2 that one cannot hope to obtain any finite (multiplicative) approximation of the gender imbalance. To establish this hardness, we reduce from a special case of the Set Cover problem. Recall that in Set Cover, we are given a universe and a collection of subsets of . The goal is to select as few subsets as possible that together cover the universe; we use to denote the optimum of a Set Cover instance . We consider a special case of Set Cover where ; this problem is sometimes referred to as Exact Cover by 3-Sets (X3C). We will need the following known APX-hardness of X3C.777This hardness follows from the standard NP-hardness reduction for the exact version of X3C [11] together with the PCP Theorem [1, 2]. For a more explicit statement of Lemma 2, see, e.g., Lemma 27 in the extended version of [13].
Lemma 2
For some constant , the following problem is NP-hard: Given an X3C instance , distinguish between
-
•
(YES) ;
-
•
(NO) .
Proof (of Theorem 5.2)
Given an instance of X3C, we construct an instance of our problem as follows. First, we create one female candidate for each set , and one voter for each element of so that the voter approves all sets to which the element belongs. (Hence, each candidate is approved by exactly three voters.) Next, we create additional voters, each of whom approves a new female candidate; this candidate is distinct for distinct voters. Finally, we create one more voter who approves male candidates. Let , and note that the number of voters is .
In the YES case of X3C, we can choose original female candidates so that each original voter approves at least one of them. We then choose all new female candidates and all male candidates, and obtain a gender-balanced JR committee.
On the other hand, in the NO case, we must choose at least original female candidates in order for every original voter to approve at least one of them. Moreover, JR requires that we choose all new female candidates. Hence, every JR committee contains at least female candidates, and therefore has gender imbalance at least , which is at least for sufficiently large . Choosing yields the desired result.
In spite of this hardness result, Proposition 1 implies that under the IC model, with high probability, there exists an -justifying group of size at most . When this is the case, one can choose the remaining members so as to make the final committee of size gender-balanced. In the next section, we show empirically that under several probabilistic models, a small -justifying group can usually be found efficiently via the greedy algorithms from Section 4.
6 Experiments
In this section, we conduct experiments to evaluate and complement our theoretical results. In the first experiment, we illustrate our probabilistic result for the impartial culture model (Theorem 3.1), and examine whether analogous results are likely to hold for two other random models. In our second experiment, we analyze how well GreedyCC and GreedyCandidate perform in finding small -justifying groups. The code for our experiments is available at http://github.com/Project-PRAGMA/Justifying-Groups-SAGT-2022.
6.1 Set-up
We consider three different models for generating approval instances, all of which have been previously studied in the literature [6].888In particular, we refer to the work of Elkind et al. [8] for motivation of the Euclidean models. Each model takes as input the parameters (number of voters), (number of candidates), and one additional parameter, namely, either an approval probability or a radius .
-
•
In the impartial culture (IC) model, each voter approves each of the candidates independently with probability . This model was already used in Theorem 3.1.
-
•
In the 1D-Euclidean (1D) model, each voter/candidate is assigned a uniformly random point in the interval . For a voter and a candidate , let and be their respective assigned points. Then, approves if and only if . Observe that the resulting profile belongs to the 1D-VCR class discussed in Appendix 0.B.
-
•
The 2D-Euclidean (2D) model is a natural generalization of the 1D model where each voter/candidate is assigned a uniformly random point in the unit square . Then, a voter approves a candidate if and only if the Euclidean distance between their points is at most .
The experiments were carried out on a system with 1.4 GHz Quad-Core Intel Core i5 CPU, 8GB RAM, and macOS 11.2.3 operating system. The software was implemented in Python 3.8.8 and the libraries matplotlib 3.3.4, numpy 1.20.1, and pandas 1.2.4 were used. Additionally, gurobi 9.1.2 was used to solve integer programs.



6.2 Empirical Evaluation of Theorem 3.1
For our first experiment, we focus on elections with parameters , , and . We chose a large number of voters as the statement of Theorem 3.1 concerns large values of . For each and (in increments of ), we generated elections using the IC model with parameter . We then sampled one group of size from each resulting election and checked whether it is -justifying.
Figure 2(a) illustrates the fraction of generated elections for which this is the case. To make this plot comparable to analogous plots for the other two models, we label the -axis with the average number of approvals instead of ; this number is simply . For each , the area between the vertical dashed lines indicates the range of the interval for which Theorem 3.1 shows that the probability that no size- group is -justifying converges to as ; this corresponds to the range of such that . For , Theorem 3.1 implies that all size- groups are likely to be -justifying for any average number of approvals in (i.e., for any ) as . Hence, there are no vertical dashed lines for .
In Figure 2(a), we see that the empirical results match almost exactly the prediction of Theorem 3.1. Specifically, for , we observe a sharp fall and rise in the fraction of -justifying groups precisely at the predicted values of . For , the empirical curve falls slightly before and rises slightly after the predicted points marked by the dashed lines. This is likely because the function is very close to in the transition areas, so as defined in the proof of Theorem 3.1 is very small; thus a larger value of is needed in order for the transition to be sharp.
We carried out analogous experiments for the two Euclidean models. In particular, we iterated over and (for the 1D model) or (for the 2D model), again in increments of . To make the plots for different models comparable, we compute the average number of approvals induced by each value of and label this number on the -axis. The resulting plots, shown in Figures 2(b)–2(c), differ significantly from the plot for the IC model. In particular, while we see a sharp fall in the fraction of -justifying groups when the average number of approvals is around (for all ), there is no sharp rise as in the IC model. This suggests that a statement specifying a sharp threshold analogous to Theorem 3.1 for the IC model is unlikely to hold for either of the Euclidean models. Nevertheless, it remains an interesting question whether the fraction of -justifying groups can be described theoretically for these models.
6.3 Performance of GreedyCC and GreedyCandidate
For our second experiment, we consider elections with parameters and , and iterate over (for the IC model), (for the 1D model), and (for the 2D model), each in increments of . For each value of (or ), we generated elections and computed the minimum size of an -justifying group (via an integer program) and the size of the -justifying group returned by GreedyCC and GreedyCandidate, respectively. We aggregated these numbers across different elections by computing their average. As in the first experiment, to make the plots for different models comparable, we converted the values of and to the average number of approvals induced by these values. The results are shown in Figure 3.



In general, we observe that both GreedyCC and GreedyCandidate provide decent approximations to the minimum size of an -justifying group. More precisely, the average difference between the size of a justifying group returned by GreedyCC and the minimum size is less than for all three models and parameters, while for GreedyCandidate this difference is at most . The standard deviation of the size of justifying groups returned by GreedyCC and GreedyCandidate is similar for the two Euclidean models and below for all tested parameters. For the IC model, GreedyCandidate induces a smaller variance than GreedyCC. Moreover, on average, both greedy algorithms found -justifying groups of size at most for almost all models and parameters—the only exception is the 1D model when the expected number of approvals is around . In absolute numbers, for this set of parameters, GreedyCC returned a justifying group of size larger than for of the instances and GreedyCandidate for of the instances. Interestingly, among all generated instances across all parameters, there was exactly one for which a smallest -justifying group was of size larger than . It is also worth noting that even though GreedyCandidate has a better worst-case guarantee than GreedyCC, this superiority is not reflected in the experiments. In particular, while GreedyCandidate performs marginally better than GreedyCC under the IC model, GreedyCC yields slightly better approximations under the Euclidean models.
We also repeated these experiments with ; the results are shown in Appendix 0.C. Notably, the plot for the IC model shows a clearer step function than Figure 3(a).
7 Conclusion and Future Work
We have investigated the notion of an -justifying group introduced by Bredereck et al. [6], which allows us to reason about the justified representation (JR) condition with respect to groups smaller than the target size . We showed that -justifying groups of size less than typically exist, which means that the number of committees of size satisfying JR is usually large. We also presented approximate algorithms for computing a small justifying group as well as an exact algorithm when the instance admits a tree representation. By starting with such a group, one can efficiently find a committee of size fulfilling both JR and gender balance, even though the problem is NP-hard in the worst case.
Given the typically large number of JR committees, a natural direction is to impose desirable properties on the committee on top of JR. In addition to gender balance, several other properties have been studied by Bredereck et al. [5]. For instance, when organizing an academic workshop, one could require that at least a certain fraction of the invitees be junior researchers, or that the invitees come from a certain number of countries or continents. We expect that algorithms for computing small justifying groups will be useful for handling other diversity constraints as well. It would also be interesting to study analogs of -justifying groups for the more demanding representation notions of proportional justified representation (PJR) and extended justified representation (EJR), in particular to see whether these analogs yield qualitatively different results.
7.0.1 Acknowledgments.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 101002854), from the Deutsche Forschungsgemeinschaft under grant BR 4744/2-1, from JST PRESTO under grant number JPMJPR20C1, from the Singapore Ministry of Education under grant number MOE-T2EP20221-0001, and from an NUS Start-up Grant. We would like to thank the anonymous SAGT reviewers for their comments.
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/a2b3f9e5-1aef-445b-9117-816d6dbd5bf9/erceu.png)
References
- [1] Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and hardness of approximation problems. Journal of the ACM 45(3), 501–555 (1998)
- [2] Arora, S., Safra, S.: Probabilistic checking of proofs; a new characterization of NP. Journal of the ACM 45(1), 70–122 (1998)
- [3] Aziz, H., Brill, M., Conitzer, V., Elkind, E., Freeman, R., Walsh, T.: Justified representation in approval-based committee voting. Social Choice and Welfare 48(2), 461–485 (2017)
- [4] Brams, S.J., Fishburn, P.C.: Approval Voting. Springer (2007)
- [5] Bredereck, R., Faliszewski, P., Igarashi, A., Lackner, M., Skowron, P.: Multiwinner elections with diversity constraints. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). pp. 933–940 (2018)
- [6] Bredereck, R., Faliszewski, P., Kaczmarczyk, A., Niedermeier, R.: An experimental view on committees providing justified representation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). pp. 109–115 (2019)
- [7] Dinur, I., Steurer, D.: Analytical approach to parallel repetition. In: Proceedings of the 46th ACM Symposium on Theory of Computing (STOC). pp. 624–633 (2014)
- [8] Elkind, E., Faliszewski, P., Laslier, J.F., Skowron, P., Slinko, A., Talmon, N.: What do multiwinner voting rules do? An experiment over the two-dimensional Euclidean domain. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI). pp. 494–501 (2017)
- [9] Elkind, E., Lackner, M.: Structure in dichotomous preferences. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI). pp. 2019–2025 (2015)
- [10] Faliszewski, P., Skowron, P., Slinko, A., Talmon, N.: Multiwinner voting: a new challenge for social choice theory. In: Endriss, U. (ed.) Trends in Computational Social Choice, chap. 2, pp. 27–47. AI Access (2017)
- [11] Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman (1979)
- [12] Godziszewski, M., Batko, P., Skowron, P., Faliszewski, P.: An analysis of approval-based committee rules for 2D-Euclidean elections. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI). pp. 5448–5455 (2021)
- [13] Gupta, A., Lee, E., Li, J., Manurangsi, P., Wlodarczyk, M.: Losing treewidth by separating subsets. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). pp. 1731–1749 (2019), extended version available at arXiv:1804.01366
- [14] Kilgour, D.M.: Approval balloting for multi-winner elections. In: Laslier, J.F., Sanver, M.R. (eds.) Handbook on Approval Voting, chap. 6, pp. 105–124. Springer (2010)
- [15] Lu, T., Boutilier, C.: Budgeted social choice: From consensus to personalized decision making. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI). pp. 280–286 (2011)
- [16] Moshkovitz, D.: The Projection Games Conjecture and the NP-hardness of ln -approximating set-cover. Theory of Computing 11, 221–235 (2015)
- [17] Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions—i. Mathematical Programming 14(1), 265–294 (1978)
- [18] Vazirani, V.V.: Approximation Algorithms. Springer (2003)
- [19] Yang, Y.: On the tree representations of dichotomous preferences. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). pp. 644–650 (2019)
Appendix 0.A Hardness of Approximating -Justifying Group
As we observed in Section 1.1, when , the problem of computing a small -justifying group for a given instance is equivalent to the Set Cover problem, which is NP-hard to approximate to within a factor of . Below, we show that this hardness holds even when . We first consider the case where the ratio is constant.
Theorem 0.A.1
Let be a constant integer. For any constant , it is NP-hard to find an -justifying group that is at most times larger than a smallest -justifying group, even when .
To establish this hardness, we use a reduction from the Set Cover problem. Recall that in Set Cover, we are given a universe and a collection of subsets of . The goal is to select as few subsets as possible that together cover the universe; we use to denote the optimum of a Set Cover instance .
Proof (of Theorem 0.A.1)
Given an instance of Set Cover, we construct an instance of the -justifying group problem as follows. Let . First, we create voters, with voters corresponding to the Set Cover elements. For every subset , we create a candidate that are approved by exactly the voters in . In addition, for every , we construct a candidate that is approved by the voters .
To begin with, note that a smallest -justifying group of the constructed instance has size at most , because simply picking the candidates corresponding to the Set Cover solution yields an -justifying group.
Next, suppose that for some constant , there is a polynomial-time algorithm that can find an -justifying group of size at most times the size of a smallest -justifying group; choose large enough so that . Then this algorithm will find an -justifying group of size at most . Notice that we may assume that this group does not contain any of the candidates ’s, because we may replace a candidate by an arbitrary candidate such that contains . Therefore, we may assume that this group consists of candidates . Notice also that ; this is because, for every , voters approve a common candidate but do not approve any of the selected candidates. As a result, we have found a set cover of size in polynomial time. Note that
which means that the set cover that we have found in polynomial time has size at most , where . But from Theorem 0.A.2, this is NP-hard.
What happens if is larger than by a superconstant factor? Even in that case, the reduction in Theorem 0.A.1 can still be modified to yield a similar hardness result.
Theorem 0.A.3
Let be a constant integer. For any constant , it is NP-hard to find an -justifying group that is at most times larger than a smallest -justifying group, even when .
Proof
We use the same construction as in the proof of Theorem 0.A.1, choosing and . If for some constant there is a polynomial-time algorithm that can find an -justifying group of size at most times the size of a smallest -justifying group, then a similar argument as in the proof of Theorem 0.A.1 shows that we can use this algorithm to find a set cover of size at most in polynomial time. But from Theorem 0.A.2, this is NP-hard.
Appendix 0.B Preference Restrictions
We have shown in Section 4.3 that the problem of computing a smallest -justifying group can be solved efficiently for instances that admit a tree representation (TR). In order to better understand the class TR and related preference restriction classes, we explore the relationships between them in this section.
First, we exhibit that TR contains a recently introduced class called 1D voter/candidate range model (1D-VCR) [12]. For convenience, we state the definition of 1D-VCR here. For a 1D-VCR instance , each has a center of influence and a radius of influence . We denote by and the leftmost and the rightmost point of ’s range of influence, respectively. We call the interval of . A voter approves a candidate if and only if ’s interval and ’s interval have a non-empty intersection, i.e.,
Before establishing the containment, we prove some basic properties of 1D-VCR preferences. Our first observation is that if a candidate is more appealing than candidate , in the sense that the interval of is contained in that of , then every voter who approves also approves .
Lemma 3
Consider a 1D-VCR instance. If a voter approves candidate , then approves any candidate whose interval contains that of , i.e., .
Proof
Since voter approves candidate , we have . Since , it holds that . We conclude that approves .
An interval is said to be nested in another interval if and , with at least one inequality being strict. The next lemma ensures that if a voter approves two candidates whose intervals are not nested in each other’s, then approves any “intermediate” candidate whose interval lies between the intervals of the two approved candidates.
Lemma 4
Consider a 1D-VCR instance. If a voter approves candidates and with and , then also approves any candidate such that and .
Proof
Since voter approves candidates and , we have and . Hence, and . This means that approves , as claimed.
We are now ready to show the containment relation between 1D-VCR and TR.
Proposition 2
Every 1D-VCR instance admits a TR. Moreover, such a TR can be computed in polynomial time.
Proof
Let be a 1D-VCR instance with voter set . We construct a tree as follows. Consider a maximal unnested subset of —call it —where a candidate is said to be nested if its interval is nested in another candidate’s interval. We reindex the candidates in so that and , where . Then, we add the path to , call these nodes the “level- nodes”, and define . For the remaining candidates in , we iteratively apply the following procedure: pick a candidate whose interval is not nested in the interval of any other candidate in , and make a child of a node in whose interval strictly contains (such a node exists by definition of ), breaking ties in favor of nodes with a higher level. Remove from , and define the level of as the level of its parent plus .
We claim that the following two statements hold:
-
(i)
Let for some , and be the level- ancestor of (possibly ). Then, all candidates on the path from to are in .
-
(ii)
Let for some and . Then, .
To prove (i), let be the path from to ; in particular, and . By construction of , it holds that . Since , Lemma 3 implies that . As for (ii), since and , Lemma 4 together with the assumption that imply that .
Clearly, can be constructed in polynomial time; we now show that it is a valid tree representation of the instance . To this end, fix for some voter . It suffices to show that there is a walk from to , possibly going through some nodes more than once, such that all candidates on this walk belong to . Consider a walk composed of the path from to its level- ancestor , the path from to the level- ancestor of , and the path from to . By (i), all nodes in the first and third paths are in ; by (ii), all nodes in the second path are in too. This means that the entire walk is contained in , completing the proof.
In the chain of tree representation classes depicted by Yang [19, Fig. 4], the largest class contained in TR is the class of PTR. An instance admits a path-tree representation (PTR) if there exists a tree with vertex set corresponding to the candidate set such that the approval set of every voter induces a path in . Below, we present examples demonstrating that PTR and 1D-VCR do not contain each other—this further highlights the generality of TR and also shows that the inclusion of 1D-VCR in TR is strict.
Proposition 3
There exists a 1D-VCR instance that admits no PTR.
Proof
Consider the following instance: , , and
To see that this instance admits a 1D-VCR representation, consider the following intervals:
Now, assume for contradiction that the instance admits a path-tree representation. Then, since , all four candidates must lie on a path in the tree. But then has at most two neighbors, so one of the approval sets , , and is not a path, a contradiction.
Proposition 4
There exists an instance that is not 1D-VCR but admits a PTR.
Proof
Consider the following instance: , , and
This instance admits a PTR representation in which is the center of a star graph with three leaves. Now, assume for contradiction that the instance admits a 1D-VCR representation, and let , , and be the intervals of the candidates , , and . For every pair of candidates , there exists a voter who approves but not , so none of the intervals , , can be nested in another one. Hence, we may assume without loss of generality that and . Let be the interval of voter . As , we have , so either or . In the former case, , contradicting , while in the latter case, , contradicting .
Appendix 0.C Additional Experiments



We repeated our experiments from Figure 3 with an increased number of voters: we set while keeping the remaining parameters unchanged. More precisely, we created elections with parameters and , and iterated over (for the IC model), (for the 1D model), and (for the 2D model), each in increments of . For each value of (or ), we generated elections and computed the size of the -justifying group returned by GreedyCC and GreedyCandidate, respectively (unfortunately, due to the high number of voters, computing the size of a smallest -justifying group was infeasible). We aggregated these numbers across different elections by computing their average. As in the previous experiments, to make the plots for different models comparable, we converted the values of and to the average number of approvals induced by these values. The results are shown in Figure 4.
The plot for the IC model shows a clear step function. Considering the corresponding plot in Figure 3, it appears likely that this function also represents the size of a smallest -justifying group. It is worth noting that for large values of , Theorem 3.1 suggests that the size of a smallest -justifying group in the IC model can be predicted from the parameters and : specifically, it is . If all groups of size are -justifying while all smaller groups are not, then both GreedyCC and GreedyCandidate return a group of this size. A closer look at our data indicates that this behavior occurs for most values of , as the standard deviation of the size of the returned group is extremely small. In particular, it is for almost all values of with both algorithms, and below for all parameters.
By contrast, for the two Euclidean models, the plots are relatively far from step functions. It is also unclear how the corresponding plots for the size of a smallest -justifying group would look like. Moreover, the standard deviation of the size of the group returned by GreedyCC and GreedyCandidate is significantly larger than in the IC model. Specifically, the standard deviation is nonzero for a large fraction of values of , and can be as high as .