This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: University of Oxford, UK
11email: elkind@cs.ox.ac.uk
22institutetext: AGH University of Science and Technology, Poland
22email: faliszew@agh.edu.pl
33institutetext: National Institute of Informatics, Japan
33email: ayumi_igarashi@nii.ac.jp
44institutetext: Google Research, USA
44email: pasin@google.com
55institutetext: TU Berlin, Germany
55email: u.schmidt-kraepelin@tu-berlin.de
66institutetext: National University of Singapore, Singapore
66email: warut@comp.nus.edu.sg

Justifying Groups in
Multiwinner Approval Voting

Edith Elkind 11    Piotr Faliszewski 22    Ayumi Igarashi 33    Pasin Manurangsi 44   
Ulrike Schmidt-Kraepelin
55
   Warut Suksompong 66
Abstract

Justified representation (JR) is a standard notion of representation in multiwinner approval voting. Not only does a JR committee always exist, but previous work has also shown through experiments that the JR condition can typically be fulfilled by groups of fewer than kk candidates, where kk is the target size of the committee. In this paper, we study such groups—known as n/kn/k-justifying groups—both theoretically and empirically. First, we show that under the impartial culture model, n/kn/k-justifying groups of size less than k/2k/2 are likely to exist, which implies that the number of JR committees is usually large. We then present efficient approximation algorithms that compute a small n/kn/k-justifying group for any given instance, and a polynomial-time exact algorithm when the instance admits a tree representation. In addition, we demonstrate that small n/kn/k-justifying groups can often be useful for obtaining a gender-balanced JR committee even though the problem is NP-hard.

Keywords:
Justified representation Multiwinner voting Computational social choice.

1 Introduction

Country X needs to select a set of singers to represent it in an international song festival. Not surprisingly, each member of the selection board has preferences over the singers, depending possibly on the singers’ ability and style or on the type of songs that they perform. How should the board aggregate the preferences of its members and decide on the group of singers to invite for the festival?

The problem of choosing a set of candidates based on the preferences of voters—be it singers for a song festival selected by the festival’s board, researchers selected by the conference’s program committee to give full talks, or places to include on the list of world heritage sites based on votes by Internet users—is formally studied under the name of multiwinner voting [10]. In many applications, the voters’ preferences are expressed in the form of approval ballots, wherein each voter either approves or disapproves each candidate; this is a simple yet expressive form of preference elicitation [4, 14]. When selecting a committee, an important consideration is that this committee adequately represents groups of voters who share similar preferences. A natural notion of representation, which was proposed by Aziz et al. [3] and has received significant interest since then, is justified representation (JR). Specifically, if there are nn voters and the goal is to select kk candidates, a committee is said to satisfy JR if for any group of at least n/kn/k voters all of whom approve a common candidate, at least one of these voters approves some candidate in the committee.

A committee satisfying JR always exists for any voter preferences, and can be found by several voting procedures [3]. In fact, Bredereck et al. [6] observed experimentally that when the preferences are generated according to a range of stochastic distributions, the number of JR committees is usually very high. This observation led them to introduce the notion of an n/kn/k-justifying group, which is a group of candidates that already fulfills the JR requirement even though its size may be smaller than kk. Bredereck et al. found that, in their experiments, small n/kn/k-justifying groups (containing fewer than k/2k/2 candidates) typically exist. This finding helps explain why there are often numerous JR committees—indeed, to obtain a JR committee, one can start with a small n/kn/k-justifying group and then extend it with arbitrary candidates.

The goal of our work is to conduct an extensive study of n/kn/k-justifying groups, primarily from a theoretical perspective but also through experiments. Additionally, we demonstrate that small n/kn/k-justifying groups can be useful for obtaining JR committees with other desirable properties such as gender balance.

1.1 Our Contribution

In Section 3, we present results on n/kn/k-justifying groups and JR committees for general instances. When the voters’ preferences are drawn according to the standard impartial culture (IC) model, in which each voter approves each candidate independently with probability pp, we establish a sharp threshold on the group size: above this threshold, all groups are likely to be n/kn/k-justifying, while below the threshold, no group is likely to be. In particular, the threshold is below k/2k/2 for every value of pp, thereby providing a theoretical explanation of Bredereck et al.’s findings [6]. Our result also implies that with high probability, the number of JR committees is very large, which means that the JR condition is not as stringent as it may seem. On the other hand, we show that, in the worst case, there may be very few JR committees: their number can be as small as mk+1m-k+1 (where mm denotes the number of candidates), and this is tight.

Next, in Section 4, we focus on the problem of computing a small n/kn/k-justifying group for a given instance. While this problem is NP-hard to approximate to within a factor of o(lnn)o(\ln n) even in the case n=kn=k (since it is equivalent to the well-known Set Cover problem in that case111In Appendix 0.A, we extend this hardness to the case n/k>1n/k>1.), we show that the simple GreedyCC algorithm [15, 17] returns an n/kn/k-justifying group whose size is at most O(n)O(\sqrt{n}) times the optimal size; moreover, this factor is asymptotically tight. We then devise a new greedy algorithm, GreedyCandidate, with approximation ratio O(log(mn))O(\log(mn)). There are several applications of multiwinner voting where the number of candidates mm is either smaller or not much larger than the number of voters nn; for such applications, the approximation ratio of GreedyCandidate is much better than that of GreedyCC. Further, we show that if the voters’ preferences admit a tree representation, an optimal solution can be found in polynomial time. The tree representation condition is known to encompass several other preference restrictions [19]; interestingly, we show that it also generalizes a recently introduced class of restrictions called 1D-VCR [12].

While small n/kn/k-justifying groups are interesting in their own right given that they offer a high degree of representation relative to their size, an important benefit of finding such a group is that one can complement it with other candidates to obtain a JR committee with properties that one desires—the smaller the group, the more freedom one has in choosing the remaining members of the committee. We illustrate this with a common consideration in committee selection: gender balance.222Bredereck et al. [5] studied maximizing objective functions of committees subject to gender balance and other diversity constraints, but did not consider JR. In Section 5, we show that although it is easy to find a JR committee with at least one member of each gender, computing or even approximating the smallest gender imbalance subject to JR is NP-hard. Nevertheless, in Section 6, we demonstrate through experiments that both GreedyCC and GreedyCandidate usually find an n/kn/k-justifying group of size less than k/2k/2; by extending such a group, we obtain a gender-balanced JR committee in polynomial time. In addition, we experimentally verify our result from Section 3 in the IC model, and perform analogous experiments in two Euclidean models.

2 Preliminaries

There is a finite set of candidates C={c1,,cm}C=\{c_{1},\dots,c_{m}\} and a finite set of voters N=[n]N=[n], where we write [t]:={1,,t}[t]:=\{1,\dots,t\} for any positive integer tt. Each voter iNi\in N submits a non-empty ballot AiCA_{i}\subseteq C, and the goal is to select a committee, which is a subset of CC of size kk. Thus, an instance II of our problem can be described by a set of candidates CC, a list of ballots 𝒜=(A1,,An)\mathcal{A}=(A_{1},\dots,A_{n}), and a positive integer kmk\leq m; we write I=(C,𝒜,k)I=(C,\mathcal{A},k).

We are interested in representing the voters according to their ballots. Given an instance I=(C,𝒜,k)I=(C,\mathcal{A},k) with 𝒜=(A1,,An)\mathcal{A}=(A_{1},\dots,A_{n}), we say that a group of voters NNN^{\prime}\subseteq N is cohesive if iNAi\cap_{i\in N^{\prime}}A_{i}\neq\emptyset. Further, we say that a committee WW represents a group of voters NNN^{\prime}\subseteq N if WAiW\cap A_{i}\neq\emptyset for some iNi\in N^{\prime}. If candidate cjWc_{j}\in W is approved by voter iNi\in N, we say that cjc_{j} covers ii. We are now ready to state the justified representation axiom of Aziz et al. [3].

Definition 1 (JR)

Given an instance I=(C,𝒜,k)I=(C,\mathcal{A},k) with 𝒜=(A1,,An)\mathcal{A}=(A_{1},\dots,A_{n}), we say that a committee WCW\subseteq C of size kk provides justified representation (JR) for II if it represents every cohesive group of voters NNN^{\prime}\subseteq N such that |N|n/k|N^{\prime}|\geq n/k. We refer to such a committee as a JR committee.

More generally, we can extend the JR condition to groups of fewer than kk candidates (the requirement that this group represents every cohesive group of at least n/kn/k voters is with respect to the original parameter kk). Bredereck et al. [6] called such a group of candidates an n/kn/k-justifying group.

A simple yet important algorithm in this setting is GreedyCC [15, 17]. We consider a slight modification of this algorithm. Our algorithm starts with the empty committee and iteratively adds one candidate at a time. At each step, if there is still an unrepresented cohesive group of size at least n/kn/k, the algorithm identifies a largest such group and adds a common approved candidate of the group to the committee. If no such group exists, the algorithm returns the current set of candidates, which is n/kn/k-justifying by definition. It is not hard to verify that (our version of) GreedyCC runs in polynomial time and outputs an n/kn/k-justifying group of size at most kk. Sometimes we may let the algorithm continue by identifying a largest unrepresented cohesive group (of size smaller than n/kn/k) and adding a common approved candidate of the group.

3 General Guarantees

In order to be n/kn/k-justifying, a group may need to include kk candidates in the worst case: this happens, e.g., when nn is divisible by kk, the first kk candidates are approved by disjoint sets of n/kn/k voters each, and the remaining mkm-k candidates are only approved by one voter each. However, many instances admit much smaller n/kn/k-justifying groups. Indeed, in the extreme case, if there is no cohesive group of voters, the empty group already suffices. It is therefore interesting to ask what happens in the average case. We focus on the well-studied impartial culture (IC) model, in which each voter approves each candidate independently with probability pp. If p=0p=0, the empty group is already n/kn/k-justifying, while if p=1p=1, any singleton group is sufficient. For each pp, we establish a sharp threshold on the group size: above this threshold, all groups are likely to be n/kn/k-justifying, while below the threshold, it is unlikely that any group is n/kn/k-justifying.

Theorem 3.1

Suppose that mm and kk are fixed, and let p(0,1)p\in(0,1) be a real constant and s[0,k]s\in[0,k] an integer constant. Assume that the votes are distributed according to the IC model with parameter pp.

  1. (a)

    If p(1p)s<1/kp(1-p)^{s}<1/k, then with high probability as nn\rightarrow\infty, every group of ss candidates is n/kn/k-justifying.

  2. (b)

    If p(1p)s>1/kp(1-p)^{s}>1/k, then with high probability as nn\rightarrow\infty, no group of ss candidates is n/kn/k-justifying.

Here, “with high probability” means that the probability converges to 11 as nn\rightarrow\infty. To prove this result, we will make use of the following standard probabilistic bound.

Lemma 1 (Chernoff bound)

Let X1,,XtX_{1},\dots,X_{t} be independent random variables taking values in [0,1][0,1], and let S:=X1++XtS:=X_{1}+\cdots+X_{t}. Then, for any δ[0,1]\delta\in[0,1],

Pr[S(1+δ)𝔼[S]]exp(δ2𝔼[S]3)\Pr[S\geq(1+\delta)\mathbb{E}[S]]\leq\exp\left(\frac{-\delta^{2}\mathbb{E}[S]}{3}\right)

and

Pr[S(1δ)𝔼[S]]exp(δ2𝔼[S]2).\Pr[S\leq(1-\delta)\mathbb{E}[S]]\leq\exp\left(\frac{-\delta^{2}\mathbb{E}[S]}{2}\right).
Proof (of Theorem 3.1)

(a) Let p(1p)s=1/kεp(1-p)^{s}=1/k-\varepsilon for some constant ε\varepsilon, and consider any group WCW\subseteq C of size ss. We claim that for any candidate cWc\not\in W, with high probability as nn\rightarrow\infty, the number of voters who approve cc but do not approve any of the candidates in WW is less than n/kn/k. Since mm is constant, once this claim is established, we can apply the union bound over all candidates outside WW to show that WW is likely to be n/kn/k-justifying. Then, we apply the union bound over all (constant number of) groups of size ss.

Fix a candidate cWc\not\in W. For each i[n]i\in[n], let XiX_{i} be an indicator random variable that indicates whether voter ii approves cc and none of the candidates in WW; XiX_{i} takes the value 11 if so, and 0 otherwise. Let X:=i=1nXiX:=\sum_{i=1}^{n}X_{i}. We have 𝔼[Xi]=p(1p)s=1/kε\mathbb{E}[X_{i}]=p(1-p)^{s}=1/k-\varepsilon for each ii, and so 𝔼[X]=n(1/kε)\mathbb{E}[X]=n(1/k-\varepsilon). By Lemma 1, it follows that

Pr[Xnk]exp(δ2n(1kε)3),\Pr\left[X\geq\frac{n}{k}\right]\leq\exp\left(-\frac{\delta^{2}n\left(\frac{1}{k}-\varepsilon\right)}{3}\right),

where δ:=min{1,kε/(1kε)}\delta:=\min\{1,k\varepsilon/(1-k\varepsilon)\} is constant. This probability converges to 0 as nn\rightarrow\infty, proving the claim.

(b) Let p(1p)s=1/k+εp(1-p)^{s}=1/k+\varepsilon for some constant ε\varepsilon. First, suppose for contradiction that s=ks=k. The derivative of f(p):=p(1p)kf(p):=p(1-p)^{k} is f(p)=(1p)k1(1p(k+1))f^{\prime}(p)=(1-p)^{k-1}(1-p(k+1)), so f(p)f(p) attains its maximum at p=1k+1p^{*}=\frac{1}{k+1}, where f(p)=kk(k+1)k+1<1kf(p^{*})=\frac{k^{k}}{(k+1)^{k+1}}<\frac{1}{k}, a contradiction. Hence s<ks<k.

Consider any group WCW\subseteq C of size ss. We claim that for any candidate cWc\not\in W (such a candidate exists because s<ks<k), with high probability as nn\rightarrow\infty, the number of voters who approve cc but do not approve any of the candidates in WW is greater than n/kn/k. When this is the case, WW is not n/kn/k-justifying. We then apply the union bound over all possible groups WW.

Fix a candidate cWc\not\in W, and define the random variables X1,,XnX_{1},\dots,X_{n} and XX as in part (a). We have 𝔼[Xi]=p(1p)s=1/k+ε\mathbb{E}[X_{i}]=p(1-p)^{s}=1/k+\varepsilon for each ii, and so 𝔼[X]=n(1/k+ε)\mathbb{E}[X]=n(1/k+\varepsilon). By Lemma 1, it follows that

Pr[Xnk]exp(δ2n(1k+ε)2),\Pr\left[X\leq\frac{n}{k}\right]\leq\exp\left(-\frac{\delta^{2}n\left(\frac{1}{k}+\varepsilon\right)}{2}\right),

where δ:=kε/(1+kε)\delta:=k\varepsilon/(1+k\varepsilon) is constant. This probability converges to 0 as nn\rightarrow\infty, proving the claim. \square

Theorem 3.1 implies that if p<1/kp<1/k, then the empty group is already n/kn/k-justifying with high probability, because there is unlikely to be a sufficiently large cohesive group of voters. On the other hand, when p>1/kp>1/k, the threshold for the required group size ss occurs when p(1p)s=1/kp(1-p)^{s}=1/k, i.e., s=log1p(kp)s=-\log_{1-p}(kp). For k=10k=10, the maximum ss occurs at p0.24p\approx 0.24, where we have s3.19s\approx 3.19. This means that for every p[0,1]p\in[0,1], an arbitrary group of size 44 is likely to be n/kn/k-justifying. Interestingly, the threshold for ss never exceeds k/2k/2 regardless of pp.

Proposition 1

Suppose that mm and kk are fixed, and let p(0,1)p\in(0,1) be a real constant and sk/2s\geq k/2 an integer constant. Assume that the votes are distributed according to the IC model with parameter pp. Then, with high probability as nn\rightarrow\infty, every group of size ss is n/kn/k-justifying.

Proof

By part (a) of Theorem 3.1, it suffices to show that p(1p)s<1/kp(1-p)^{s}<1/k for all p(0,1)p\in(0,1) and integers sk/2s\geq k/2. As in the analysis of part (b) of Theorem 3.1, the function f(p):=p(1p)sf(p):=p(1-p)^{s} attains its maximum at p=1s+1p^{*}=\frac{1}{s+1}, where f(p)=ss(s+1)s+1=1s(1+1/s)s+1f(p^{*})=\frac{s^{s}}{(s+1)^{s+1}}=\frac{1}{s(1+1/s)^{s+1}}. By Bernoulli’s inequality, we have (1+1/s)s+11+(s+1)/s>2(1+1/s)^{s+1}\geq 1+(s+1)/s>2. It follows that

p(1p)sf(p)<12s1k,p(1-p)^{s}\leq f(p^{*})<\frac{1}{2s}\leq\frac{1}{k},

as desired. \square

We remark that the proposition would not hold if we were to replace k/2k/2 by k/3k/3: indeed, for k=15k=15, the maximum ss occurs at p0.17p\approx 0.17, where we have s5.03>15/3s\approx 5.03>15/3.

An implication of Proposition 1 is that under the IC model, with high probability, every size-kk committee provides JR. This raises the question of whether the number of JR committees is large even in the worst case. The following example shows that the answer is negative: when nn is divisible by kk, the number of JR committees can be as small as mk+1m-k+1.

Example 1

Assume that nn is divisible by kk. Consider an instance I=(C,𝒜,k)I=(C,\mathcal{A},k) where

  • A1==Ank={c1}A_{1}=\dots=A_{\frac{n}{k}}=\{c_{1}\};

  • Ank+1==A2nk={c2}A_{\frac{n}{k}+1}=\dots=A_{\frac{2n}{k}}=\{c_{2}\};

  • \vdots

  • A(k2)nk+1==A(k1)nk={ck1}A_{\frac{(k-2)n}{k}+1}=\dots=A_{\frac{(k-1)n}{k}}=\{c_{k-1}\};

  • A(k1)nk+1==An={ck,ck+1,,cm}A_{\frac{(k-1)n}{k}+1}=\dots=A_{n}=\{c_{k},c_{k+1},\dots,c_{m}\}.

A JR committee must include c1,,ck1c_{1},\dots,c_{k-1}; for the last slot, any of the remaining mk+1m-k+1 candidates can be chosen. Hence, there are exactly mk+1m-k+1 JR committees.

We complement Example 1 by establishing that, as long as every candidate is approved by at least one voter, there are always at least mk+1m-k+1 JR committees.333The condition that every candidate is approved by at least one voter is necessary. Indeed, if the last approval set in Example 1 is changed from {ck,ck+1,,cm}\{c_{k},c_{k+1},\dots,c_{m}\} to {ck}\{c_{k}\}, then there is only one JR committee: {c1,c2,,ck}\{c_{1},c_{2},\dots,c_{k}\}. This matches the upper bound in Example 1 and improves upon the bound of m/km/k by Bredereck et al. [6, Thm. 3]. Moreover, the bound holds regardless of whether nn is divisible by kk.

Theorem 3.2

For every instance I=(C,𝒜,k)I=(C,\mathcal{A},k) such that every candidate in CC is approved by some voter, at least mk+1m-k+1 committees of size kk provide JR.

Proof

We run GreedyCC for k1k-1 steps. If the resulting group (of size k1k-1) is already n/kn/k-justifying, we can choose any of the remaining mk+1m-k+1 candidates as the final member of the committee. Hence, assume that the group after k1k-1 steps is not n/kn/k-justifying. This means that each of the first k1k-1 candidates covers exactly n/kn/k voters (these sets of voters are disjoint), and the remaining n/kn/k voters are covered by another candidate. In particular, nn is divisible by kk. Call these blocks of n/kn/k voters B1,,BkB_{1},\dots,B_{k}, and assume without loss of generality that the corresponding candidates are c1,,ckc_{1},\dots,c_{k}, respectively. For each of the remaining mkm-k candidates, the candidate is approved by at least one voter, say in block BiB_{i}, so we can combine the candidate with {c1,,ci1,ci+1,,ck}\{c_{1},\dots,c_{i-1},c_{i+1},\dots,c_{k}\} to form a JR committee. This yields mkm-k distinct JR committees. Finally, the committee {c1,,ck}\{c_{1},\dots,c_{k}\} also provides JR and differs from all of the above commitees. It follows that there are at least mk+1m-k+1 JR committees. \square

4 Instance-Specific Optimization

As we have seen in Section 3, several instances admit an n/kn/k-justifying group of size much smaller than the worst-case size kk. However, the problem of computing a minimum-size n/kn/k-justifying group is NP-hard to approximate to within a factor of o(lnn)o(\ln n) even when n=kn=k (see Section 1.1). In this section, we address the question of how well we can approximate such a group in polynomial time.

4.1 GreedyCC

A natural approach to computing a small n/kn/k-justifying group is to simply run (our variant of) GreedyCC, stopping as soon as the current group is n/kn/k-justifying. However, as the following example shows, the output of this algorithm may be Θ(n)\Theta(\sqrt{n}) times larger than the optimal solution.

Example 2

Let n=k2n=k^{2} and m=2km=2k, for some k3k\geq 3. Consider an instance I=(C,𝒜,k)I=(C,\mathcal{A},k) where

  • A1==Ak1={c1}A_{1}=\dots=A_{k-1}=\{c_{1}\};

  • A(k1)+1==A2(k1)={c2}A_{(k-1)+1}=\dots=A_{2(k-1)}=\{c_{2}\};

  • \vdots

  • A(k2)(k1)+1==A(k1)2={ck1}A_{(k-2)(k-1)+1}=\dots=A_{(k-1)^{2}}=\{c_{k-1}\};

  • A(k1)2+1={c1,ck}A_{(k-1)^{2}+1}=\{c_{1},c_{k}\};

  • A(k1)2+2={c2,ck}A_{(k-1)^{2}+2}=\{c_{2},c_{k}\};

  • \vdots

  • Ak(k1)={ck1,ck}A_{k(k-1)}=\{c_{k-1},c_{k}\};

  • Ak(k1)+1={ck+1}A_{k(k-1)+1}=\{c_{k+1}\};

  • Ak(k1)+2={ck+2}A_{k(k-1)+2}=\{c_{k+2}\};

  • \vdots

  • Ak2={c2k}A_{k^{2}}=\{c_{2k}\}.

Since c1,,ck1c_{1},\dots,c_{k-1} are each approved by pairwise disjoint groups of kk voters, while ckc_{k} is approved by k1k-1 voters, GreedyCC outputs the group {c1,,ck1}\{c_{1},\dots,c_{k-1}\}. However, the singleton group {ck}\{c_{k}\} is already n/kn/k-justifying. The ratio between the sizes of the two groups is (k1)Θ(n)(k-1)\in\Theta(\sqrt{n}).

It turns out that Example 2 is already a worst-case scenario for GreedyCC, up to a constant factor.

Theorem 4.1

For every instance I=(C,𝒜,k)I=(C,\mathcal{A},k), GreedyCC outputs an n/kn/k-justifying group at most 2n\sqrt{2n} times larger than a smallest n/kn/k-justifying group.

Proof

Assume without loss of generality that COPT:={c1,c2,,ct}C_{\text{OPT}}:=\{c_{1},c_{2},\dots,c_{t}\} is a smallest n/kn/k-justifying group; our goal is to show that GreedyCC selects at most 2nt\sqrt{2n}\cdot t candidates. For j[t]j\in[t], we say that a candidate crCc_{r}\in C (possibly r=jr=j) chosen by GreedyCC crosses cjc_{j} if crc_{r} is approved by some voter ii who approves cjc_{j} and does not approve any candidate chosen by GreedyCC up to the point when crc_{r} is selected. Note that each candidate selected by GreedyCC must cross some candidate in COPTC_{\text{OPT}}—indeed, if not, the cohesive group of at least n/kn/k voters that forces GreedyCC to select the candidate would not be represented by COPTC_{\text{OPT}}, contradicting the assumption that COPTC_{\text{OPT}} is n/kn/k-justifying.

Now, we claim that for each j[t]j\in[t], the candidate cjc_{j} can be crossed by at most 2n\sqrt{2n} candidates in the GreedyCC solution; this suffices for the desired conclusion. The claim is immediate if cjc_{j} is approved by at most 2n\sqrt{2n} voters, because each candidate selected by GreedyCC that crosses cjc_{j} must cover a new voter who approves cjc_{j}. Assume therefore that cjc_{j} is approved by more than 2n\sqrt{2n} voters, and suppose for contradiction that cjc_{j} is crossed by more than 2n\sqrt{2n} candidates in the GreedyCC solution. Denote these candidates by c1,,csc_{\ell_{1}},\dots,c_{\ell_{s}} in the order that GreedyCC selects them, where s>2ns>\sqrt{2n}. Notice that for each i[s1]i\in[s-1], when GreedyCC selects cic_{\ell_{i}}, it favors cic_{\ell_{i}} over cjc_{j}, which would cover at least si+1s-i+1 uncovered voters (i.e., the “crossing points” of ci,,csc_{\ell_{i}},\dots,c_{\ell_{s}} with cjc_{j}). Hence, cic_{\ell_{i}} itself must cover at least si+1s-i+1 uncovered voters. Moreover, csc_{\ell_{s}} covers at least one uncovered voter. This means that the total number of voters is at least s+(s1)++1=s(s+1)/2>ns+(s-1)+\dots+1=s(s+1)/2>n, a contradiction. \square

4.2 GreedyCandidate

Next, we present a different greedy algorithm, which provides an approximation ratio of ln(mn)+1\ln(mn)+1. Note that this ratio is asymptotically better than the ratio of GreedyCC in the range m2o(n)m\in 2^{o(\sqrt{n})}; several practical elections fall under this range, since the number of candidates is typically smaller or, at worst, not much larger than the number of voters (e.g., when Internet users vote upon world heritage site candidates or students elect student council members).

To understand our new algorithm, recall that GreedyCC can be viewed as a greedy covering algorithm, where the goal is to pick candidates to cover the voters. Our new algorithm instead views the problem as “covering” the candidates. Specifically, for a set of candidates WCW\subseteq C to be an n/kn/k-justifying group, all but at most :=n/k1\ell:=\lceil n/k\rceil-1 of the voters who approve each candidate in CC must be “covered” by WW. In other words, each candidate cCc\in C must be “covered” at least [|Bc0|]+[|B_{c}^{0}|-\ell]_{+} times, where Bc0B_{c}^{0} denotes the set of voters who approve cc and we use the notation [x]+[x]_{+} as a shorthand for max{x,0}\max\{x,0\}. Our algorithm greedily picks in each step a candidate whose selection would minimize the corresponding potential function, cC[|Bc|]+\sum_{c^{\prime}\in C}[|B_{c^{\prime}}|-\ell]_{+}, where BcB_{c^{\prime}} denotes the set of voters who approve cc^{\prime} but do not approve any candidate selected by the algorithm thus far. The pseudocode of the algorithm, which we call GreedyCandidate, is presented as Algorithm 1. One can check that GreedyCandidate runs in polynomial time.

Input: An instance (C,𝒜,k)(C,\mathcal{A},k), where A={A1,,An}A=\{A_{1},\ldots,A_{n}\}
Output: An n/kn/k-justifying group WCW\subseteq C
1 n/k1\ell\leftarrow\lceil n/k\rceil-1
2 WW\leftarrow\emptyset
3 for cCc\in C do
4       Bc{i[n]:cAi}B_{c}\leftarrow\{i\in[n]:c\in A_{i}\}
5      
6 end for
7while there exists cCc\in C such that |Bc|>|B_{c}|>\ell do
8       for cCc\in C do
9             uccC([|Bc|]+[|BcBc|]+)u_{c}\leftarrow\sum_{c^{\prime}\in C}([|B_{c^{\prime}}|-\ell]_{+}-[|B_{c^{\prime}}\setminus B_{c}|-\ell]_{+})
10            
11       end for
12      cargmaxcCucc^{*}\leftarrow\arg\max_{c\in C}u_{c}
13       WW{c}W\leftarrow W\cup\{c^{*}\}
14       for cCc\in C do
15             Bc(BcBc)B_{c}\leftarrow(B_{c}\setminus B_{c^{*}})
16       end for
17      
18 end while
return WW
Algorithm 1 GreedyCandidate
Theorem 4.2

For every instance I=(C,𝒜,k)I=(C,\mathcal{A},k), GreedyCandidate outputs an n/kn/k-justifying group that is at most (ln(mn)+1)(\ln(mn)+1) times larger than a smallest n/kn/k-justifying group.

Proof

First, note that whenever |Bc||B_{c}|\leq\ell for all cCc\in C, every unrepresented cohesive group has size at most <n/k\ell<n/k, meaning that the output WW of our algorithm is indeed an n/kn/k-justifying group.

Next, let us bound the size of the output WW. Assume without loss of generality that COPT:={c1,c2,,ct}C_{\text{OPT}}:=\{c_{1},c_{2},\dots,c_{t}\} is a smallest n/kn/k-justifying group. If t=0t=0, then the while-loop immediately terminates and the algorithm outputs W=W=\emptyset. Thus, we may henceforth assume that t1t\geq 1.

For each cCc\in C, denote by BciB^{i}_{c} the set BcB_{c} after the ii-th iteration of the while-loop (so Bc0B^{0}_{c} is simply the set of voters who approve cc). Let ψi\psi_{i} denote the potential cC[|Bc|]+\sum_{c^{\prime}\in C}[|B_{c^{\prime}}|-\ell]_{+} after the ii-th iteration (so ψ0\psi_{0} is the potential at the beginning). We will show that this potential decreases by at least a factor of 11/t1-1/t with each iteration; more formally,

ψi(11t)ψi1\displaystyle\psi_{i}\leq\left(1-\frac{1}{t}\right)\cdot\psi_{i-1} (1)

for each i1i\geq 1.

Before we prove (1), let us show how we can use it to bound |W||W|. To this end, observe that when the potential is less than 11, the while-loop terminates. This means that ψ|W|11\psi_{|W|-1}\geq 1. Furthermore, we have ψ0cC|Bc0|mn\psi_{0}\leq\sum_{c^{\prime}\in C}|B^{0}_{c^{\prime}}|\leq mn. Applying (1), we get

1ψ|W|1\displaystyle 1\leq\psi_{|W|-1}\leq\cdots (11t)|W|1ψ0e|W|1tmn,\displaystyle\leq\left(1-\frac{1}{t}\right)^{|W|-1}\cdot\psi_{0}\leq e^{-\frac{|W|-1}{t}}\cdot mn,

where for the last inequality we use the bound 1+xex1+x\leq e^{x}, which holds for any xx\in\mathbb{R}. Rearranging, we arrive at |W|1+tln(mn)t(ln(mn)+1)|W|\leq 1+t\ln(mn)\leq t(\ln(mn)+1), as desired.

We now return to proving (1). Our assumption that COPTC_{\text{OPT}} is an n/kn/k-justifying group implies that |Bc0(j=1tBcj0)|\left|B^{0}_{c^{\prime}}\setminus\left(\bigcup_{j=1}^{t}B^{0}_{c_{j}}\right)\right|\leq\ell for all cCc^{\prime}\in C. In each iteration, the algorithm replaces BcB_{c} by BcBcB_{c}\setminus B_{c^{*}} for all cc, so we also have |Bci(j=1tBcji)|\left|B^{i}_{c^{\prime}}\setminus\left(\bigcup_{j=1}^{t}B^{i}_{c_{j}}\right)\right|\leq\ell for all cCc^{\prime}\in C and ii. Fix any i1i\geq 1, let q:=i1q:=i-1, and let cc^{*} be the candidate chosen in the ii-th iteration. From the definition of cc^{*}, we have

ψi1ψi\displaystyle\psi_{i-1}-\psi_{i} =cC([|Bcq|]+[|BcqBcq|]+)\displaystyle=\sum_{c^{\prime}\in C}\left(\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}-\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c^{*}}|-\ell\right]_{+}\right)
1tj=1tcC([|Bcq|]+[|BcqBcjq|]+)\displaystyle\geq\frac{1}{t}\sum_{j=1}^{t}\sum_{c^{\prime}\in C}\left(\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}-\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+}\right)
=ψi1cC(1tj=1t[|BcqBcjq|]+).\displaystyle=\psi_{i-1}-\sum_{c^{\prime}\in C}\left(\frac{1}{t}\sum_{j=1}^{t}\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+}\right). (2)

Consider any cCc^{\prime}\in C. We claim that

j=1t[|BcqBcjq|]+(t1)[|Bcq|]+.\displaystyle\sum_{j=1}^{t}\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+}\leq(t-1)\cdot\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}. (3)

To see that (3) holds, consider the following two cases:

  • Case 1: |BcqBcjq||B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j^{\prime}}}|\leq\ell for some cjCOPTc_{j^{\prime}}\in C_{\operatorname{OPT}}. We may assume without loss of generality that j=tj^{\prime}=t. We have

    j=1t[|BcqBcjq|]+\displaystyle\sum_{j=1}^{t}\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+} =j=1t1[|BcqBcjq|]+\displaystyle=\sum_{j=1}^{t-1}\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+}
    j=1t1[|Bcq|]+=(t1)[|Bcq|]+.\displaystyle\leq\sum_{j=1}^{t-1}\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}=(t-1)\cdot\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}.
  • Case 2: |BcqBcjq|>|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|>\ell for all cjCOPTc_{j}\in C_{\operatorname{OPT}}. This means that |Bcq|>|B^{q}_{c^{\prime}}|>\ell, and

    j=1t[|BcqBcjq|]+\displaystyle\sum_{j=1}^{t}\left[|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell\right]_{+} =j=1t(|BcqBcjq|)\displaystyle=\sum_{j=1}^{t}(|B^{q}_{c^{\prime}}\setminus B^{q}_{c_{j}}|-\ell)
    =t(|Bcq|)j=1t|BcqBcjq|\displaystyle=t(|B^{q}_{c^{\prime}}|-\ell)-\sum_{j=1}^{t}|B^{q}_{c^{\prime}}\cap B^{q}_{c_{j}}|
    t(|Bcq|)(|Bcq|)=(t1)[|Bcq|]+,\displaystyle\leq t(|B^{q}_{c^{\prime}}|-\ell)-(|B^{q}_{c^{\prime}}|-\ell)=(t-1)\cdot\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+},

    where we use |Bcq(j=1tBcjq)|\left|B^{q}_{c^{\prime}}\setminus\left(\bigcup_{j=1}^{t}B^{q}_{c_{j}}\right)\right|\leq\ell for the inequality.

Hence, in both cases, (3) holds. Plugging this back into (2), we get

ψi1ψiψi1t1tcC[|Bcq|]+=1tψi1.\displaystyle\psi_{i-1}-\psi_{i}\geq\psi_{i-1}-\frac{t-1}{t}\cdot\sum_{c^{\prime}\in C}\left[|B^{q}_{c^{\prime}}|-\ell\right]_{+}=\frac{1}{t}\cdot\psi_{i-1}.

This implies (1) and completes our proof. \square

Although we do not know whether an efficient O(ln(n))O(\ln(n))-approximation algorithm exists, we show next that by combining Theorem 4.2 with a brute-force approach, we can arrive at a quasi-polynomial-time444Recall that a running time is said to be quasi-polynomial if it is of the form exp(logO(1)I)\exp(\log^{O(1)}I), where II denotes the input size (in our case, I=(nm)O(1)I=(nm)^{O(1)}). algorithm that has an approximation ratio of O(nδ)O(n^{\delta}) for any constant δ>0\delta>0.

Theorem 4.3

For any constant δ(0,1)\delta\in(0,1) there exists an exp(logO(1)(nm))\exp(\log^{O(1)}(nm))-time algorithm that, on input I=(C,𝒜,k)I=(C,\mathcal{A},k), outputs an n/kn/k-justifying group that is at most O(nδ)O(n^{\delta}) times larger than a smallest n/kn/k-justifying group.

Proof

If m2nδm\leq 2^{n^{\delta}}, then we may simply run GreedyCandidate, which runs in polynomial time and yields an approximation ratio of ln(nm)+1O(nδ)\ln(nm)+1\in O(n^{\delta}).

Otherwise, we iterate over all subsets NNN^{\prime}\subseteq N. For each NN^{\prime}, we run GreedyCC on NN^{\prime} until all voters in NN^{\prime} are covered (i.e., we stop only when no voter in NN^{\prime} remains unrepresented); denote the resulting set of candidates by WNW_{N^{\prime}}. Finally, among the 2|N|2^{|N|} sets, we output a smallest one that is n/kn/k-justifying with respect to NN. Notice that the running time of our algorithm is O(2n(nm)O(1))exp(logO(1/δ)m)O(2^{n}\cdot(nm)^{O(1)})\in\exp(\log^{O(1/\delta)}m); this follows from m>2nδm>2^{n^{\delta}}. To analyze the approximation guarantee, let COPTC_{\text{OPT}} denote a smallest n/kn/k-justifying group, and N:=cCOPTBcN^{*}:=\bigcup_{c\in C_{\text{OPT}}}B_{c} be the set of voters covered by COPTC_{\text{OPT}}, where BcB_{c} denotes the set of voters who approve cc. When N=NN^{\prime}=N^{*}, we have that WNW_{N^{\prime}} is an n/kn/k-justifying group and, from standard analyses of the greedy set cover algorithm,555See, for example, Chapter 2 in the book by Vazirani [18]. we have |WN|(lnn+1)|COPT||W_{N^{\prime}}|\leq(\ln n+1)\cdot|C_{\text{OPT}}|. In other words, the approximation ratio is at most lnn+1O(nδ)\ln n+1\in O(n^{\delta}), as claimed. \square

4.3 Tree Representation

Even though computing a smallest n/kn/k-justifying group is NP-hard even to approximate, we show in this section that this problem becomes tractable if the instance admits a tree representation. An instance is said to admit a tree representation if its candidates can be arranged on a tree TT in such a way that the approved candidates of each voter form a subtree of TT (i.e., the subgraph of TT induced by each approval set is connected). While the tree representation condition is somewhat restrictive, we remark that it is general enough to capture a number of other preference restrictions [19, Fig. 4]. In particular, we show in Appendix 0.B that it encompasses a recently introduced class called 1-dimensional voter/candidate range model (1D-VCR) [12].666Together with the results of Yang [19] and Godziszewski et al. [12], this means that the tree representation also captures the candidate interval (CI) and voter interval (VI) domains, the two most commonly studied restrictions for the approval setting, introduced by Elkind and Lackner [9].

Theorem 4.4

For every instance I=(C,𝒜,k)I=(C,\mathcal{A},k) admitting a tree representation, a smallest n/kn/k-justifying group can be computed in polynomial time.

Proof
Ct¯\overline{C_{t}}TtT_{t}vvwwzz
Figure 1: Illustration for the proof of Theorem 4.4.

Let TT be a tree representation of II, i.e., for every iNi\in N the set AiA_{i} induces a subtree of TT. Root TT at an arbitrary node, and define the depth of a node in TT as its distance from the root node (so the root node itself has depth 0). For each subtree T^\widehat{T} of TT, denote by V(T^)V(\widehat{T}) the set of its nodes, and for each node vV(T^)v\in V(\widehat{T}), denote by T^v\widehat{T}^{v} the subtree of T^\widehat{T} rooted at vv (i.e., T^v\widehat{T}^{v} contains all nodes in T^\widehat{T} whose path towards the root of T^\widehat{T} passes vv). The algorithm sets W=W=\emptyset and proceeds as follows:

  1. 1.

    Select a node vv of maximum depth such that there exists a set SS of n/k\lceil n/k\rceil voters with the following two properties:

    1. (a)

      AiV(Tv)A_{i}\subseteq V(T^{v}) for all iSi\in S;

    2. (b)

      iSAi\bigcap_{i\in S}A_{i}\neq\emptyset.

    If no such node exists, delete all candidates from CC, delete the remaining tree TT, and return WW.

  2. 2.

    Add vv to WW, remove all voters ii such that AiV(Tv)A_{i}\cap V(T^{v})\neq\emptyset from 𝒜\mathcal{A}, and delete V(Tv)V(T^{v}) from CC and TvT^{v} from TT. Go back to Step 1.

Except for the last round, the algorithm adds one candidate to the set WW in each round, so it runs for |W|+1|W|+1 rounds, where we slightly abuse notation and use WW to refer to the final output from now on. Each round can be implemented in polynomial time—indeed, for each node vv, we can consider the sets AiA_{i} that are contained in V(Tv)V(T^{v}) and check whether some node in V(Tv)V(T^{v}) appears in at least n/k\lceil n/k\rceil of these sets.

We now establish the correctness of the algorithm. For each round t{0,1,,|W|+1}t\in\{0,1,\dots,|W|+1\}, we define WtW_{t} to be the set of candidates selected by the algorithm up to and including round tt, and TtT_{t} to be the remaining tree after round tt, where round 0 refers to the point before the execution of the algorithm (so W0=W_{0}=\emptyset and T0=TT_{0}=T). We also define Ct¯:=V(T)V(Tt)\overline{C_{t}}:=V(T)\setminus V(T_{t}) to be the set of candidates deleted up to and including round tt. See Figure 1 for an illustration.

Claim

After each round t{0,1,,|W|+1}t\in\{0,1,\dots,|W|+1\},

(i) there exists a smallest n/kn/k-justifying group WW^{\prime} of the original instance II such that Ct¯W=Wt\overline{C_{t}}\cap W^{\prime}=W_{t}, and

(ii) for each iNi\in N, at least one of the following three relations holds: AiCt¯A_{i}\subseteq\overline{C_{t}}, AiCt¯=A_{i}\cap\overline{C_{t}}=\emptyset, or AiWtA_{i}\cap W_{t}\neq\emptyset.

Proof (of Claim)

We prove the Claim by induction on tt. For t=0t=0, we have C0¯=W0=\overline{C_{0}}=W_{0}=\emptyset, so AiC0¯=A_{i}\cap\overline{C_{0}}=\emptyset for all iNi\in N, and both (i) and (ii) hold trivially. Now consider any t[|W|+1]t\in[|W|+1] and assume that the Claim holds for t1t-1.

Case 11: t<|W|+1t<|W|+1. Let vv be the candidate selected in this round and SS be the corresponding set of voters in the algorithm. Then, Ct¯=Ct1¯V(Tt1v)\overline{C_{t}}=\overline{C_{t-1}}\cup V(T^{v}_{t-1}) and Wt=Wt1{v}W_{t}=W_{t-1}\cup\{v\}. Let WW^{\prime} be a smallest n/kn/k-justifying group with Ct1¯W=Wt1\overline{C_{t-1}}\cap W^{\prime}=W_{t-1} as guaranteed by the induction hypothesis. If V(Tt1v)W={v}V(T^{v}_{t-1})\cap W^{\prime}=\{v\}, then statement (i) of the Claim follows by choosing the same set WW^{\prime}. Assume therefore that V(Tt1v)W{v}V(T^{v}_{t-1})\cap W^{\prime}\neq\{v\}. If V(Tt1v)W=V(T^{v}_{t-1})\cap W^{\prime}=\emptyset, then WW^{\prime} does not represent the cohesive group of voters SS of size n/k\lceil n/k\rceil, a contradiction. Hence, there exists a candidate w(V(Tt1v)W){v}w\in(V(T_{t-1}^{v})\cap W^{\prime})\setminus\{v\}; let zz be the parent of ww (possibly z=vz=v). See Figure 1.

We will show that (W{w}){z}(W^{\prime}\setminus\{w\})\cup\{z\} is a smallest n/kn/k-justifying group. Since its size is at most the size of WW^{\prime}, if it is an n/kn/k-justifying group, then it is also a smallest one. Assume for contradiction that the group is not n/kn/k-justifying. This means there exists a group of voters SS^{\prime} of size n/k\lceil n/k\rceil such that: (1) the voters in SS^{\prime} approve a common candidate yy; (2) at least one voter in SS^{\prime} approves ww; and (3) none of the voters in SS^{\prime} approves any candidate in Wt1{z}W_{t-1}\cup\{z\}.

Observe that for a voter jSj\in S^{\prime} who approves ww, we know that AjCt1¯=A_{j}\cap\overline{C_{t-1}}=\emptyset by statement (ii) of the Claim for t1t-1. Moreover, AjV(Tt1w)Ct1¯A_{j}\subseteq V(T_{t-1}^{w})\cup\overline{C_{t-1}} since jj does not approve zz. Combining the previous two sentences, we have AjV(Tt1w)A_{j}\subseteq V(T_{t-1}^{w}). As a result, yV(Tt1w)y\in V(T_{t-1}^{w}) as well, and by the same arguments as for jj we get that AiV(Tt1w)A_{i}\subseteq V(T_{t-1}^{w}) for all iSi\in S^{\prime}. However, this is a contradiction to the choice of vv in the algorithm.

Applying this argument (as we did between ww and zz) repeatedly until we reach vv, we obtain that there exists a smallest n/kn/k-justifying group WW^{\prime} with Ct¯W=Wt\overline{C_{t}}\cap W^{\prime}=W_{t}. This proves statement (i) of the Claim.

For (ii), we argue that for each voter iNi\in N, at least one of the relations AiCt¯A_{i}\subseteq\overline{C_{t}}, AiCt¯=A_{i}\cap\overline{C_{t}}=\emptyset, and AiWtA_{i}\cap W_{t}\neq\emptyset holds. Consider a voter ii for whom neither AiCt¯A_{i}\subseteq\overline{C_{t}} nor AiCt¯=A_{i}\cap\overline{C_{t}}=\emptyset holds. If AiCt1¯A_{i}\cap\overline{C_{t-1}}\neq\emptyset, then AiWt1A_{i}\cap W_{t-1}\neq\emptyset (and therefore AiWtA_{i}\cap W_{t}\neq\emptyset) follows from the induction hypothesis—indeed, since AiCt¯A_{i}\not\subseteq\overline{C_{t}}, it must be that AiCt1¯A_{i}\not\subseteq\overline{C_{t-1}}. Hence, we can assume that AiCt1¯=A_{i}\cap\overline{C_{t-1}}=\emptyset and AiCt¯A_{i}\cap\overline{C_{t}}\neq\emptyset, which means that AiV(Tt1v)A_{i}\cap V(T_{t-1}^{v})\neq\emptyset. Since AiA_{i} is not a subset of Ct¯\overline{C_{t}}, this implies that vAiv\in A_{i}, and therefore AiWtA_{i}\cap W_{t}\neq\emptyset. This establishes (ii).

Case 22: t=|W|+1t=|W|+1. We will show that Wt1W_{t-1} is an n/kn/k-justifying group; assume for contradiction that it is not. By the induction hypothesis, there exists an n/kn/k-justifying group WW^{\prime} such that Ct1¯W=Wt1\overline{C_{t-1}}\cap W^{\prime}=W_{t-1}. In particular, there exists a group of voters SS^{\prime} of size n/k\lceil n/k\rceil such that all of them approve a common candidate, at least one of them approves a candidate in WWt1W^{\prime}\setminus W_{t-1}, and none of them approves any candidate in Wt1W_{t-1}. Similarly to Case 1, we can argue that for all iSi\in S^{\prime}, it holds that AiCCt1¯A_{i}\subseteq C\setminus\overline{C_{t-1}}. It follows that the root node of TT satisfies both conditions (a) and (b) in Step 1 of the algorithm, contradicting the fact that the algorithm terminated. Hence, Wt1W_{t-1} is an n/kn/k-justifying group, and we can take W=Wt1W^{\prime}=W_{t-1} for statement (i) of the Claim.

For (ii), it suffices to observe that Ct¯=C\overline{C_{t}}=C, which means that AiCt¯A_{i}\subseteq\overline{C_{t}} for all iNi\in N. \square

As statement (i) of the claim holds in particular for t=|W|+1t=|W|+1, in which case Ct¯=C\overline{C_{t}}=C, this concludes the proof of the theorem. \square

5 Gender Balance

In the next two sections, we demonstrate that small n/kn/k-justifying groups can be useful for obtaining JR committees with additional properties. For concreteness, we consider a common desideratum: gender balance. Indeed, in many candidate selection scenarios, it is preferable to have a balance with respect to the gender of the committee members. Formally, assume that each candidate in CC belongs to one of two types, male and female. For each committee WCW\subseteq C, we define the gender imbalance of WW as the absolute value of the difference between the number of male candidates and the number of female candidates in WW. A committee is said to be gender-balanced if its gender imbalance is 0.

The following example shows that gender balance can be at odds with justified representation.

Example 3

Suppose that n=kn=k is even, each voter i[n1]i\in[n-1] only approves a male candidate aia_{i}, while voter nn approves female candidates b1,,bn1b_{1},\dots,b_{n-1}. Any JR committee must contain all of a1,,an1a_{1},\dots,a_{n-1}, and can therefore contain at most one female candidate. But there exists a (non-JR) gender-balanced committee {a1,,an/2,b1,,bn/2}\{a_{1},\dots,a_{n/2},b_{1},\dots,b_{n/2}\}.

Example 3 is as bad as it gets: under very mild conditions, there always exists a JR committee with at least one representative of each gender.

Theorem 5.1

For every instance I=(C,𝒜,k)I=(C,\mathcal{A},k) such that for each gender, some candidate of that gender is approved by at least one voter, there exists a JR committee with at least one member of each gender. Moreover, such a committee can be computed in polynomial time.

Proof

As in the proof of Theorem 3.2, we run GreedyCC for k1k-1 steps, and consider two cases. If the resulting group of size k1k-1 is already n/kn/k-justifying, we can choose the last member to be of the missing gender if necessary. Else, we continue with the kk-th step, and assume that the obtained committee is, say, all-female. In this case, like in the proof of Theorem 3.2, nn must be divisible by kk, and there exist disjoint blocks of n/kn/k voters B1,,BkB_{1},\dots,B_{k} and candidates c1,,ckc_{1},\dots,c_{k} such that for each j[k]j\in[k], all voters in block BjB_{j} approve candidate cjc_{j}. Take a male candidate who is approved by some voter, say, in block BiB_{i}. Then, the committee consisting of this candidate together with c1,,ci1,ci+1,,ckc_{1},\dots,c_{i-1},c_{i+1},\dots,c_{k} provides JR and contains members of both genders. Since the construction of the blocks B1,,BkB_{1},\dots,B_{k} can be done in polynomial time, computing the committee also takes polynomial time. \square

In light of Theorem 5.1, it is natural to ask for a JR committee with the lowest gender imbalance. Unfortunately, our next result shows that deciding whether there exists a gender-balanced committee that provides JR, or even obtaining a close approximation thereof, is computationally hard.

Theorem 5.2

Even when n=kn=k, there exists a constant ε>0\varepsilon>0 such that distinguishing between the following two cases is NP-hard:

  • (YES) There exists a gender-balanced JR committee;

  • (NO) Every JR committee has gender imbalance εk\geq\varepsilon k.

It follows from Theorem 5.2 that one cannot hope to obtain any finite (multiplicative) approximation of the gender imbalance. To establish this hardness, we reduce from a special case of the Set Cover problem. Recall that in Set Cover, we are given a universe [u]={1,,u}[u]=\{1,\dots,u\} and a collection 𝒮={S1,,SM}\mathcal{S}=\{S_{1},\dots,S_{M}\} of subsets of [u][u]. The goal is to select as few subsets as possible that together cover the universe; we use OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}) to denote the optimum of a Set Cover instance (u,𝒮)(u,\mathcal{S}). We consider a special case of Set Cover where |S1|==|SM|=3|S_{1}|=\dots=|S_{M}|=3; this problem is sometimes referred to as Exact Cover by 3-Sets (X3C). We will need the following known APX-hardness of X3C.777This hardness follows from the standard NP-hardness reduction for the exact version of X3C [11] together with the PCP Theorem [1, 2]. For a more explicit statement of Lemma 2, see, e.g., Lemma 27 in the extended version of [13].

Lemma 2

For some constant ζ(0,1/3)\zeta\in(0,1/3), the following problem is NP-hard: Given an X3C instance (u,𝒮)(u,\mathcal{S}), distinguish between

  • (YES) OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)=u/3\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})=u/3;

  • (NO) OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)u(1/3+ζ)\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})\geq u(1/3+\zeta).

Proof (of Theorem 5.2)

Given an instance of X3C, we construct an instance of our problem as follows. First, we create one female candidate for each set SiS_{i}, and one voter for each element of [u][u] so that the voter approves all sets SiS_{i} to which the element belongs. (Hence, each candidate is approved by exactly three voters.) Next, we create u/3+1u/3+1 additional voters, each of whom approves a new female candidate; this candidate is distinct for distinct voters. Finally, we create one more voter who approves 2u/3+12u/3+1 male candidates. Let k=4u/3+2k=4u/3+2, and note that the number of voters is n=kn=k.

In the YES case of X3C, we can choose u/3u/3 original female candidates so that each original voter approves at least one of them. We then choose all u/3+1u/3+1 new female candidates and all 2u/3+12u/3+1 male candidates, and obtain a gender-balanced JR committee.

On the other hand, in the NO case, we must choose at least u(1/3+ζ)u(1/3+\zeta) original female candidates in order for every original voter to approve at least one of them. Moreover, JR requires that we choose all u/3+1u/3+1 new female candidates. Hence, every JR committee contains at least (2u/3+1)+ζu(2u/3+1)+\zeta u female candidates, and therefore has gender imbalance at least 2ζu=3ζ(k2)/22\zeta u=3\zeta(k-2)/2, which is at least ζk\zeta k for sufficiently large kk. Choosing ε=ζ\varepsilon=\zeta yields the desired result. \square

In spite of this hardness result, Proposition 1 implies that under the IC model, with high probability, there exists an n/kn/k-justifying group of size at most k/2k/2. When this is the case, one can choose the remaining members so as to make the final committee of size kk gender-balanced. In the next section, we show empirically that under several probabilistic models, a small n/kn/k-justifying group can usually be found efficiently via the greedy algorithms from Section 4.

6 Experiments

In this section, we conduct experiments to evaluate and complement our theoretical results. In the first experiment, we illustrate our probabilistic result for the impartial culture model (Theorem 3.1), and examine whether analogous results are likely to hold for two other random models. In our second experiment, we analyze how well GreedyCC and GreedyCandidate perform in finding small n/kn/k-justifying groups. The code for our experiments is available at http://github.com/Project-PRAGMA/Justifying-Groups-SAGT-2022.

6.1 Set-up

We consider three different models for generating approval instances, all of which have been previously studied in the literature [6].888In particular, we refer to the work of Elkind et al. [8] for motivation of the Euclidean models. Each model takes as input the parameters nn (number of voters), mm (number of candidates), and one additional parameter, namely, either an approval probability pp or a radius rr.

  • In the impartial culture (IC) model, each voter approves each of the mm candidates independently with probability pp. This model was already used in Theorem 3.1.

  • In the 1D-Euclidean (1D) model, each voter/candidate is assigned a uniformly random point in the interval [0,1][0,1]. For a voter vv and a candidate cc, let xvx_{v} and xcx_{c} be their respective assigned points. Then, vv approves cc if and only if |xvxc|r|x_{v}-x_{c}|\leq r. Observe that the resulting profile belongs to the 1D-VCR class discussed in Appendix 0.B.

  • The 2D-Euclidean (2D) model is a natural generalization of the 1D model where each voter/candidate is assigned a uniformly random point in the unit square [0,1]×[0,1][0,1]\times[0,1]. Then, a voter vv approves a candidate cc if and only if the Euclidean distance between their points is at most rr.

The experiments were carried out on a system with 1.4 GHz Quad-Core Intel Core i5 CPU, 8GB RAM, and macOS 11.2.3 operating system. The software was implemented in Python 3.8.8 and the libraries matplotlib 3.3.4, numpy 1.20.1, and pandas 1.2.4 were used. Additionally, gurobi 9.1.2 was used to solve integer programs.

Refer to caption
(a) IC Model
Refer to caption
(b) 1D-Euclidean Model
Refer to caption
(c) 2D-Euclidean Model
Figure 2: Experimental results evaluating Theorem 3.1 as well as analogous settings for two Euclidean models. For each plot, the xx-axis shows the average number of approvals of a voter for each parameter pp (or rr), and the yy-axis shows the fraction of the 10001000 generated elections for which a randomly selected size-ss group is n/kn/k-justifying. The dashed vertical lines indicate the transition points for large nn as shown in Theorem 3.1.

6.2 Empirical Evaluation of Theorem 3.1

For our first experiment, we focus on elections with parameters n=5000n=5000, m=100m=100, and k=10k=10. We chose a large number of voters as the statement of Theorem 3.1 concerns large values of nn. For each s{1,2,3,4}s\in\{1,2,3,4\} and p[0,1)p\in[0,1) (in increments of 0.020.02), we generated 10001000 elections using the IC model with parameter pp. We then sampled one group of size ss from each resulting election and checked whether it is n/kn/k-justifying.

Figure 2(a) illustrates the fraction of generated elections for which this is the case. To make this plot comparable to analogous plots for the other two models, we label the xx-axis with the average number of approvals instead of pp; this number is simply pm=100pp\cdot m=100p. For each ss, the area between the vertical dashed lines indicates the range of the interval [0,100][0,100] for which Theorem 3.1 shows that the probability that no size-ss group is n/kn/k-justifying converges to 11 as nn\rightarrow\infty; this corresponds to the range of pp such that p(1p)s>1/kp(1-p)^{s}>1/k. For s=4s=4, Theorem 3.1 implies that all size-ss groups are likely to be n/kn/k-justifying for any average number of approvals in [0,100][0,100] (i.e., for any p[0,1]p\in[0,1]) as nn\rightarrow\infty. Hence, there are no vertical dashed lines for s=4s=4.

In Figure 2(a), we see that the empirical results match almost exactly the prediction of Theorem 3.1. Specifically, for s{1,2}s\in\{1,2\}, we observe a sharp fall and rise in the fraction of n/kn/k-justifying groups precisely at the predicted values of pp. For s=3s=3, the empirical curve falls slightly before and rises slightly after the predicted points marked by the dashed lines. This is likely because the function p(1p)3p(1-p)^{3} is very close to 1/k1/k in the transition areas, so ε:=1/kp(1p)3\varepsilon:=1/k-p(1-p)^{3} as defined in the proof of Theorem 3.1 is very small; thus a larger value of nn is needed in order for the transition to be sharp.

We carried out analogous experiments for the two Euclidean models. In particular, we iterated over s{1,2,3,4}s\in\{1,2,3,4\} and r[0,1)r\in[0,1) (for the 1D model) or r[0,1.2)r\in[0,1.2) (for the 2D model), again in increments of 0.020.02. To make the plots for different models comparable, we compute the average number of approvals induced by each value of rr and label this number on the xx-axis. The resulting plots, shown in Figures 2(b)2(c), differ significantly from the plot for the IC model. In particular, while we see a sharp fall in the fraction of n/kn/k-justifying groups when the average number of approvals is around 1010 (for all ss), there is no sharp rise as in the IC model. This suggests that a statement specifying a sharp threshold analogous to Theorem 3.1 for the IC model is unlikely to hold for either of the Euclidean models. Nevertheless, it remains an interesting question whether the fraction of n/kn/k-justifying groups can be described theoretically for these models.

6.3 Performance of GreedyCC and GreedyCandidate

For our second experiment, we consider elections with parameters n=m=100n=m=100 and k=10k=10, and iterate over p[0,1)p\in[0,1) (for the IC model), r[0,1)r\in[0,1) (for the 1D model), and r[0,1.2)r\in[0,1.2) (for the 2D model), each in increments of 0.020.02. For each value of pp (or rr), we generated 200200 elections and computed the minimum size of an n/kn/k-justifying group (via an integer program) and the size of the n/kn/k-justifying group returned by GreedyCC and GreedyCandidate, respectively. We aggregated these numbers across different elections by computing their average. As in the first experiment, to make the plots for different models comparable, we converted the values of pp and rr to the average number of approvals induced by these values. The results are shown in Figure 3.

Refer to caption
(a) IC Model
Refer to caption
(b) 1D-Euclidean Model
Refer to caption
(c) 2D-Euclidean Model
Figure 3: Experimental results on the performance of GreedyCC and GreedyCandidate. For each plot, the xx-axis shows the average number of approvals of a voter for each parameter pp (or rr), and the yy-axis shows the average size of an n/kn/k-justifying group output by GreedyCC and GreedyCandidate as well as the average size of a smallest n/kn/k-justifying group.

In general, we observe that both GreedyCC and GreedyCandidate provide decent approximations to the minimum size of an n/kn/k-justifying group. More precisely, the average difference between the size of a justifying group returned by GreedyCC and the minimum size is less than 11 for all three models and parameters, while for GreedyCandidate this difference is at most 1.31.3. The standard deviation of the size of justifying groups returned by GreedyCC and GreedyCandidate is similar for the two Euclidean models and below 11 for all tested parameters. For the IC model, GreedyCandidate induces a smaller variance than GreedyCC. Moreover, on average, both greedy algorithms found n/kn/k-justifying groups of size at most k/2=5k/2=5 for almost all models and parameters—the only exception is the 1D model when the expected number of approvals is around 1111. In absolute numbers, for this set of parameters, GreedyCC returned a justifying group of size larger than k/2k/2 for 8484 of the 200200 instances and GreedyCandidate for 7575 of the 200200 instances. Interestingly, among all 3200032000 generated instances across all parameters, there was exactly one for which a smallest n/kn/k-justifying group was of size larger than 55. It is also worth noting that even though GreedyCandidate has a better worst-case guarantee than GreedyCC, this superiority is not reflected in the experiments. In particular, while GreedyCandidate performs marginally better than GreedyCC under the IC model, GreedyCC yields slightly better approximations under the Euclidean models.

We also repeated these experiments with n=5000n=5000; the results are shown in Appendix 0.C. Notably, the plot for the IC model shows a clearer step function than Figure 3(a).

7 Conclusion and Future Work

We have investigated the notion of an n/kn/k-justifying group introduced by Bredereck et al. [6], which allows us to reason about the justified representation (JR) condition with respect to groups smaller than the target size kk. We showed that n/kn/k-justifying groups of size less than k/2k/2 typically exist, which means that the number of committees of size kk satisfying JR is usually large. We also presented approximate algorithms for computing a small justifying group as well as an exact algorithm when the instance admits a tree representation. By starting with such a group, one can efficiently find a committee of size kk fulfilling both JR and gender balance, even though the problem is NP-hard in the worst case.

Given the typically large number of JR committees, a natural direction is to impose desirable properties on the committee on top of JR. In addition to gender balance, several other properties have been studied by Bredereck et al. [5]. For instance, when organizing an academic workshop, one could require that at least a certain fraction of the invitees be junior researchers, or that the invitees come from a certain number of countries or continents. We expect that algorithms for computing small justifying groups will be useful for handling other diversity constraints as well. It would also be interesting to study analogs of n/kn/k-justifying groups for the more demanding representation notions of proportional justified representation (PJR) and extended justified representation (EJR), in particular to see whether these analogs yield qualitatively different results.

7.0.1 Acknowledgments.

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 101002854), from the Deutsche Forschungsgemeinschaft under grant BR 4744/2-1, from JST PRESTO under grant number JPMJPR20C1, from the Singapore Ministry of Education under grant number MOE-T2EP20221-0001, and from an NUS Start-up Grant. We would like to thank the anonymous SAGT reviewers for their comments.

[Uncaptioned image]

References

  • [1] Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and hardness of approximation problems. Journal of the ACM 45(3), 501–555 (1998)
  • [2] Arora, S., Safra, S.: Probabilistic checking of proofs; a new characterization of NP. Journal of the ACM 45(1), 70–122 (1998)
  • [3] Aziz, H., Brill, M., Conitzer, V., Elkind, E., Freeman, R., Walsh, T.: Justified representation in approval-based committee voting. Social Choice and Welfare 48(2), 461–485 (2017)
  • [4] Brams, S.J., Fishburn, P.C.: Approval Voting. Springer (2007)
  • [5] Bredereck, R., Faliszewski, P., Igarashi, A., Lackner, M., Skowron, P.: Multiwinner elections with diversity constraints. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). pp. 933–940 (2018)
  • [6] Bredereck, R., Faliszewski, P., Kaczmarczyk, A., Niedermeier, R.: An experimental view on committees providing justified representation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). pp. 109–115 (2019)
  • [7] Dinur, I., Steurer, D.: Analytical approach to parallel repetition. In: Proceedings of the 46th ACM Symposium on Theory of Computing (STOC). pp. 624–633 (2014)
  • [8] Elkind, E., Faliszewski, P., Laslier, J.F., Skowron, P., Slinko, A., Talmon, N.: What do multiwinner voting rules do? An experiment over the two-dimensional Euclidean domain. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI). pp. 494–501 (2017)
  • [9] Elkind, E., Lackner, M.: Structure in dichotomous preferences. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI). pp. 2019–2025 (2015)
  • [10] Faliszewski, P., Skowron, P., Slinko, A., Talmon, N.: Multiwinner voting: a new challenge for social choice theory. In: Endriss, U. (ed.) Trends in Computational Social Choice, chap. 2, pp. 27–47. AI Access (2017)
  • [11] Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman (1979)
  • [12] Godziszewski, M., Batko, P., Skowron, P., Faliszewski, P.: An analysis of approval-based committee rules for 2D-Euclidean elections. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI). pp. 5448–5455 (2021)
  • [13] Gupta, A., Lee, E., Li, J., Manurangsi, P., Wlodarczyk, M.: Losing treewidth by separating subsets. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). pp. 1731–1749 (2019), extended version available at arXiv:1804.01366
  • [14] Kilgour, D.M.: Approval balloting for multi-winner elections. In: Laslier, J.F., Sanver, M.R. (eds.) Handbook on Approval Voting, chap. 6, pp. 105–124. Springer (2010)
  • [15] Lu, T., Boutilier, C.: Budgeted social choice: From consensus to personalized decision making. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI). pp. 280–286 (2011)
  • [16] Moshkovitz, D.: The Projection Games Conjecture and the NP-hardness of ln nn-approximating set-cover. Theory of Computing 11, 221–235 (2015)
  • [17] Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions—i. Mathematical Programming 14(1), 265–294 (1978)
  • [18] Vazirani, V.V.: Approximation Algorithms. Springer (2003)
  • [19] Yang, Y.: On the tree representations of dichotomous preferences. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). pp. 644–650 (2019)

Appendix 0.A Hardness of Approximating n/kn/k-Justifying Group

As we observed in Section 1.1, when n=kn=k, the problem of computing a small n/kn/k-justifying group for a given instance is equivalent to the Set Cover problem, which is NP-hard to approximate to within a factor of o(lnn)o(\ln n). Below, we show that this hardness holds even when n>kn>k. We first consider the case where the ratio n/kn/k is constant.

Theorem 0.A.1

Let >1\ell>1 be a constant integer. For any constant ε>0\varepsilon>0, it is NP-hard to find an n/kn/k-justifying group that is at most (1ε)lnn(1-\varepsilon)\ln n times larger than a smallest n/kn/k-justifying group, even when n/k=n/k=\ell.

To establish this hardness, we use a reduction from the Set Cover problem. Recall that in Set Cover, we are given a universe [u]={1,,u}[u]=\{1,\dots,u\} and a collection 𝒮={S1,,SM}\mathcal{S}=\{S_{1},\dots,S_{M}\} of subsets of [u][u]. The goal is to select as few subsets as possible that together cover the universe; we use OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}) to denote the optimum of a Set Cover instance (u,𝒮)(u,\mathcal{S}).

Theorem 0.A.2 ([7, 16])

For any constant δ>0\delta>0, it is NP-hard to find a set cover of size at most (1δ)lnuOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)(1-\delta)\ln u\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}).

Proof (of Theorem 0.A.1)

Given an instance of Set Cover, we construct an instance of the n/kn/k-justifying group problem as follows. Let k=uk=u. First, we create n=un=\ell\cdot u voters, with voters 1,2,,u1,2,\dots,u corresponding to the Set Cover elements. For every subset SiS_{i}, we create a candidate cSic_{S_{i}} that are approved by exactly the voters in SiS_{i}. In addition, for every j[u]j\in[u], we construct a candidate cjc^{*}_{j} that is approved by the \ell voters j,j+u,,j+(1)uj,j+u,\dots,j+(\ell-1)u.

To begin with, note that a smallest n/kn/k-justifying group of the constructed instance has size at most OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}), because simply picking the candidates corresponding to the Set Cover solution yields an n/kn/k-justifying group.

Next, suppose that for some constant ε>0\varepsilon>0, there is a polynomial-time algorithm that can find an n/kn/k-justifying group of size at most (1ε)lnn(1-\varepsilon)\ln n times the size of a smallest n/kn/k-justifying group; choose uu large enough so that lnu(2ln)/ε\ln u\geq(2\ln\ell)/\varepsilon. Then this algorithm will find an n/kn/k-justifying group of size at most z:=(1ε)lnnOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)z:=(1-\varepsilon)\ln n\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}). Notice that we may assume that this group does not contain any of the candidates cjc^{*}_{j}’s, because we may replace a candidate cjc^{*}_{j} by an arbitrary candidate cSc_{S} such that SS contains jj. Therefore, we may assume that this group consists of candidates cSi1,,cSizc_{S_{i_{1}}},\dots,c_{S_{i_{z}}}. Notice also that Si1Siz=[u]S_{i_{1}}\cup\dots\cup S_{i_{z}}=[u]; this is because, for every j[u]j\in[u], voters j,j+u,,j+(1)uj,j+u,\dots,j+(\ell-1)u approve a common candidate cjc^{*}_{j} but j+u,,j+(1)uj+u,\dots,j+(\ell-1)u do not approve any of the selected candidates. As a result, we have found a set cover of size zz in polynomial time. Note that

z\displaystyle z =(1ε)lnnOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\displaystyle=(1-\varepsilon)\ln n\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})
=(1ε)(lnu+ln)OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\displaystyle=(1-\varepsilon)(\ln u+\ln\ell)\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})
(1ε)(lnu+εlnu2)OPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\displaystyle\leq(1-\varepsilon)\left(\ln u+\frac{\varepsilon\ln u}{2}\right)\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})
(1ε2)lnuOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮),\displaystyle\leq\left(1-\frac{\varepsilon}{2}\right)\ln u\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}),

which means that the set cover that we have found in polynomial time has size at most (1δ)lnuOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)(1-\delta)\ln u\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}), where δ:=ε/2\delta:=\varepsilon/2. But from Theorem 0.A.2, this is NP-hard. \square

What happens if nn is larger than kk by a superconstant factor? Even in that case, the reduction in Theorem 0.A.1 can still be modified to yield a similar hardness result.

Theorem 0.A.3

Let d>1d>1 be a constant integer. For any constant ε>0\varepsilon>0, it is NP-hard to find an n/kn/k-justifying group that is at most 1d(1ε)lnn\frac{1}{d}(1-\varepsilon)\ln n times larger than a smallest n/kn/k-justifying group, even when n=kdn=k^{d}.

Proof

We use the same construction as in the proof of Theorem 0.A.1, choosing k=uk=u and =n/k=kd1\ell=n/k=k^{d-1}. If for some constant ε>0\varepsilon>0 there is a polynomial-time algorithm that can find an n/kn/k-justifying group of size at most 1d(1ε)lnn\frac{1}{d}(1-\varepsilon)\ln n times the size of a smallest n/kn/k-justifying group, then a similar argument as in the proof of Theorem 0.A.1 shows that we can use this algorithm to find a set cover of size at most 1d(1ε)lnnOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)=(1ε)lnuOPT𝗌𝖾𝗍𝖼𝗈𝗏(u,𝒮)\frac{1}{d}(1-\varepsilon)\ln n\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S})=(1-\varepsilon)\ln u\cdot\operatorname{OPT}_{\mathsf{setcov}}(u,\mathcal{S}) in polynomial time. But from Theorem 0.A.2, this is NP-hard. \square

Appendix 0.B Preference Restrictions

We have shown in Section 4.3 that the problem of computing a smallest n/kn/k-justifying group can be solved efficiently for instances that admit a tree representation (TR). In order to better understand the class TR and related preference restriction classes, we explore the relationships between them in this section.

First, we exhibit that TR contains a recently introduced class called 1D voter/candidate range model (1D-VCR) [12]. For convenience, we state the definition of 1D-VCR here. For a 1D-VCR instance I=(C,𝒜,k)I=(C,\mathcal{A},k), each aCNa\in C\cup N has a center of influence xax_{a} and a radius of influence rar_{a}. We denote by s(a):=xaras(a):=x_{a}-r_{a} and t(a):=xa+rat(a):=x_{a}+r_{a} the leftmost and the rightmost point of aa’s range of influence, respectively. We call Ja:=[s(a),t(a)]J_{a}:=[s(a),t(a)] the interval of aa. A voter iNi\in N approves a candidate cCc\in C if and only if cc’s interval [s(c),t(c)][s(c),t(c)] and ii’s interval [s(i),t(i)][s(i),t(i)] have a non-empty intersection, i.e.,

s(c)t(i)ands(i)t(c).\displaystyle s(c)\leq t(i)~{}\mbox{and}~{}s(i)\leq t(c).

Before establishing the containment, we prove some basic properties of 1D-VCR preferences. Our first observation is that if a candidate cc^{\prime} is more appealing than candidate cc, in the sense that the interval of cc is contained in that of cc^{\prime}, then every voter who approves cc also approves cc^{\prime}.

Lemma 3

Consider a 1D-VCR instance. If a voter ii approves candidate cc, then ii approves any candidate cc^{\prime} whose interval contains that of cc, i.e., JcJcJ_{c}\subseteq J_{c^{\prime}}.

Proof

Since voter ii approves candidate cc, we have JiJcJ_{i}\cap J_{c}\neq\emptyset. Since JcJcJ_{c}\subseteq J_{c^{\prime}}, it holds that JiJcJ_{i}\cap J_{c^{\prime}}\neq\emptyset. We conclude that ii approves cc^{\prime}. \square

An interval [x,y][x,y] is said to be nested in another interval [x,y][x^{\prime},y^{\prime}] if xxx^{\prime}\leq x and yyy\leq y^{\prime}, with at least one inequality being strict. The next lemma ensures that if a voter ii approves two candidates whose intervals are not nested in each other’s, then ii approves any “intermediate” candidate whose interval lies between the intervals of the two approved candidates.

Lemma 4

Consider a 1D-VCR instance. If a voter ii approves candidates aa and bb with s(a)s(b)s(a)\leq s(b) and t(a)t(b)t(a)\leq t(b), then ii also approves any candidate cc such that s(a)s(c)s(b)s(a)\leq s(c)\leq s(b) and t(a)t(c)t(b)t(a)\leq t(c)\leq t(b).

Proof

Since voter ii approves candidates aa and bb, we have s(i)t(a)s(i)\leq t(a) and s(b)t(i)s(b)\leq t(i). Hence, s(c)s(b)t(i)s(c)\leq s(b)\leq t(i) and s(i)t(a)t(c)s(i)\leq t(a)\leq t(c). This means that ii approves cc, as claimed. \square

We are now ready to show the containment relation between 1D-VCR and TR.

Proposition 2

Every 1D-VCR instance admits a TR. Moreover, such a TR can be computed in polynomial time.

Proof

Let I=(C,𝒜,k)I=(C,\mathcal{A},k) be a 1D-VCR instance with voter set NN. We construct a tree TT as follows. Consider a maximal unnested subset of CC—call it C0C_{0}—where a candidate is said to be nested if its interval is nested in another candidate’s interval. We reindex the candidates in C0C_{0} so that s(c1)s(c)s(c_{1})\leq\dots\leq s(c_{\ell}) and t(c1)t(c)t(c_{1})\leq\dots\leq t(c_{\ell}), where :=|C0|\ell:=|C_{0}|. Then, we add the path (c1,,c)(c_{1},\dots,c_{\ell}) to TT, call these nodes the “level-0 nodes”, and define C:=CC0C^{\prime}:=C\setminus C_{0}. For the remaining candidates in CC^{\prime}, we iteratively apply the following procedure: pick a candidate cCc\in C^{\prime} whose interval is not nested in the interval of any other candidate in CC^{\prime}, and make cc a child of a node in TT whose interval strictly contains JcJ_{c} (such a node exists by definition of C0C_{0}), breaking ties in favor of nodes with a higher level. Remove cc from CC^{\prime}, and define the level of cc as the level of its parent plus 11.

We claim that the following two statements hold:

  1. (i)

    Let cAic\in A_{i} for some iNi\in N, and cC0c^{\prime}\in C_{0} be the level-0 ancestor of cc^{\prime} (possibly c=cc=c^{\prime}). Then, all candidates on the path from cc to cc^{\prime} are in AiA_{i}.

  2. (ii)

    Let {cp,cq}AiC0\{c_{p},c_{q}\}\subseteq A_{i}\cap C_{0} for some iNi\in N and 1p<q1\leq p<q\leq\ell. Then, {cp,cp+1,,cq}Ai\{c_{p},c_{p+1},\dots,c_{q}\}\subseteq A_{i}.

To prove (i), let d0,d1,,drd_{0},d_{1},\dots,d_{r} be the path from cc^{\prime} to cc; in particular, c=d0c^{\prime}=d_{0} and c=drc=d_{r}. By construction of TT, it holds that JdrJdr1Jd0J_{d_{r}}\subseteq J_{d_{r-1}}\subseteq\dots\subseteq J_{d_{0}}. Since drAid_{r}\in A_{i}, Lemma 3 implies that {d0,d1,,dr}Ai\{d_{0},d_{1},\dots,d_{r}\}\subseteq A_{i}. As for (ii), since s(cp)s(cp+1)s(cq)s(c_{p})\leq s(c_{p+1})\leq\dots\leq s(c_{q}) and t(cp)t(cp+1)t(cq)t(c_{p})\leq t(c_{p+1})\leq\dots\leq t(c_{q}), Lemma 4 together with the assumption that {cp,cq}Ai\{c_{p},c_{q}\}\subseteq A_{i} imply that {cp,cp+1,,cq}Ai\{c_{p},c_{p+1},\dots,c_{q}\}\subseteq A_{i}.

Clearly, TT can be constructed in polynomial time; we now show that it is a valid tree representation of the instance II. To this end, fix c,cAic,c^{\prime}\in A_{i} for some voter iNi\in N. It suffices to show that there is a walk from cc to cc^{\prime}, possibly going through some nodes more than once, such that all candidates on this walk belong to AiA_{i}. Consider a walk composed of the path from cc to its level-0 ancestor dd, the path from dd to the level-0 ancestor dd^{\prime} of cc^{\prime}, and the path from dd^{\prime} to cc^{\prime}. By (i), all nodes in the first and third paths are in AiA_{i}; by (ii), all nodes in the second path are in AiA_{i} too. This means that the entire walk is contained in AiA_{i}, completing the proof. \square

In the chain of tree representation classes depicted by Yang [19, Fig. 4], the largest class contained in TR is the class of PTR. An instance admits a path-tree representation (PTR) if there exists a tree TT with vertex set corresponding to the candidate set CC such that the approval set of every voter induces a path in TT. Below, we present examples demonstrating that PTR and 1D-VCR do not contain each other—this further highlights the generality of TR and also shows that the inclusion of 1D-VCR in TR is strict.

Proposition 3

There exists a 1D-VCR instance that admits no PTR.

Proof

Consider the following instance: C={a,b,c,d}C=\{a,b,c,d\}, n=4n=4, and

A1\displaystyle A_{1} ={a,b,c,d},A2={a,b},A3={a,c},A4={a,d}.\displaystyle=\{a,b,c,d\},A_{2}=\{a,b\},A_{3}=\{a,c\},A_{4}=\{a,d\}.

To see that this instance admits a 1D-VCR representation, consider the following intervals:

Ja\displaystyle J_{a} =[0,5],Jb=[0,1],Jc=[2,3],Jd=[4,5];\displaystyle=[0,5],J_{b}=[0,1],J_{c}=[2,3],J_{d}=[4,5];
J1\displaystyle J_{1} =[0,5],J2=[0,1],J3=[2,3],J4=[4,5].\displaystyle=[0,5],J_{2}=[0,1],J_{3}=[2,3],J_{4}=[4,5].

Now, assume for contradiction that the instance admits a path-tree representation. Then, since A1={a,b,c,d}A_{1}=\{a,b,c,d\}, all four candidates must lie on a path in the tree. But then aa has at most two neighbors, so one of the approval sets A2A_{2}, A3A_{3}, and A4A_{4} is not a path, a contradiction. \square

Proposition 4

There exists an instance that is not 1D-VCR but admits a PTR.

Proof

Consider the following instance: C={a,b,c,d}C=\{a,b,c,d\}, n=3n=3, and

A1={a,b,c},A2={a,b,d},A3={a,c,d}.A_{1}=\{a,b,c\},A_{2}=\{a,b,d\},A_{3}=\{a,c,d\}.

This instance admits a PTR representation in which aa is the center of a star graph with three leaves. Now, assume for contradiction that the instance admits a 1D-VCR representation, and let Jb=[s(b),t(b)]J_{b}=[s(b),t(b)], Jc=[s(c),t(c)]J_{c}=[s(c),t(c)], and Jd=[s(d),t(d)]J_{d}=[s(d),t(d)] be the intervals of the candidates bb, cc, and dd. For every pair of candidates z,z{b,c,d}z,z^{\prime}\in\{b,c,d\}, there exists a voter who approves zz but not zz^{\prime}, so none of the intervals JbJ_{b}, JcJ_{c}, JdJ_{d} can be nested in another one. Hence, we may assume without loss of generality that s(b)s(c)s(d)s(b)\leq s(c)\leq s(d) and t(b)t(c)t(d)t(b)\leq t(c)\leq t(d). Let J2=[s(2),t(2)]J_{2}=[s(2),t(2)] be the interval of voter 22. As cA2c\not\in A_{2}, we have J2Jc=J_{2}\cap J_{c}=\emptyset, so either t(2)<s(c)t(2)<s(c) or s(2)>t(c)s(2)>t(c). In the former case, J2Jd=J_{2}\cap J_{d}=\emptyset, contradicting dA2d\in A_{2}, while in the latter case, J2Jb=J_{2}\cap J_{b}=\emptyset, contradicting bA2b\in A_{2}. \square

Appendix 0.C Additional Experiments

Refer to caption
(a) IC Model
Refer to caption
(b) 1D-Euclidean Model
Refer to caption
(c) 2D-Euclidean Model
Figure 4: Experimental results on the performance of GreedyCC and GreedyCandidate with n=5000n=5000. For each plot, the xx-axis shows the average number of approvals of a voter for each parameter pp (or rr), and the yy-axis shows the average size of an n/kn/k-justifying group output by GreedyCC and GreedyCandidate.

We repeated our experiments from Figure 3 with an increased number of voters: we set n=5000n=5000 while keeping the remaining parameters unchanged. More precisely, we created elections with parameters m=100m=100 and k=10k=10, and iterated over p[0,1)p\in[0,1) (for the IC model), r[0,1)r\in[0,1) (for the 1D model), and r[0,1.2)r\in[0,1.2) (for the 2D model), each in increments of 0.020.02. For each value of pp (or rr), we generated 200200 elections and computed the size of the n/kn/k-justifying group returned by GreedyCC and GreedyCandidate, respectively (unfortunately, due to the high number of voters, computing the size of a smallest n/kn/k-justifying group was infeasible). We aggregated these numbers across different elections by computing their average. As in the previous experiments, to make the plots for different models comparable, we converted the values of pp and rr to the average number of approvals induced by these values. The results are shown in Figure 4.

The plot for the IC model shows a clear step function. Considering the corresponding plot in Figure 3, it appears likely that this function also represents the size of a smallest n/kn/k-justifying group. It is worth noting that for large values of nn, Theorem 3.1 suggests that the size of a smallest n/kn/k-justifying group in the IC model can be predicted from the parameters pp and kk: specifically, it is τ(p,k):=log1p(kp)\tau(p,k):=\lceil-\log_{1-p}(kp)\rceil. If all groups of size τ(p,k)\tau(p,k) are n/kn/k-justifying while all smaller groups are not, then both GreedyCC and GreedyCandidate return a group of this size. A closer look at our data indicates that this behavior occurs for most values of pp, as the standard deviation of the size of the returned group is extremely small. In particular, it is 0 for almost all values of pp with both algorithms, and below 0.50.5 for all parameters.

By contrast, for the two Euclidean models, the plots are relatively far from step functions. It is also unclear how the corresponding plots for the size of a smallest n/kn/k-justifying group would look like. Moreover, the standard deviation of the size of the group returned by GreedyCC and GreedyCandidate is significantly larger than in the IC model. Specifically, the standard deviation is nonzero for a large fraction of values of rr, and can be as high as 0.90.9.