Sum and Difference Sets in Generalized Dihedral Groups

Ruben Ascoli, Justin Cheigh, Guilherme Zeus Dantas e Moura, Ryan Jeong, Andrew Keisling, Astrid Lilly, Steven J. Miller, Prakod Ngamlamai, Matthew Phang

Abstract.

Given a group $G$ , we say that a set $A\subseteq G$ has more sums than differences (MSTD) if $|A+A|>|A-A|$ , has more differences than sums (MDTS) if $|A+A|<|A-A|$ , or is sum-difference balanced if $|A+A|=|A-A|$ . A problem of recent interest has been to understand the frequencies of these type of subsets.

The seventh author and Vissuet studied the problem for arbitrary finite groups $G$ and proved that almost all subsets $A\subseteq G$ are sum-difference balanced as $|G|\to\infty$ . For the dihedral group $D_{2n}$ , they conjectured that of the remaining sets, most are MSTD, i.e., there are more MSTD sets than MDTS sets. Some progress on this conjecture was made by Haviland et al. in 2020, when they introduced the idea of partitioning the subsets by size: if, for each $m$ , there are more MSTD subsets of $D_{2n}$ of size $m$ than MDTS subsets of size $m$ , then the conjecture follows.

We extend the conjecture to generalized dihedral groups $D=\mathbb{Z}_{2}\ltimes G$ , where $G$ is an abelian group of size $n$ and the nonidentity element of $\mathbb{Z}_{2}$ acts by inversion. We make further progress on the conjecture by considering subsets with a fixed number of rotations and reflections. By bounding the expected number of overlapping sums, we show that the collection $\mathcal{S}_{D,m}$ of subsets of the generalized dihedral group $D$ of size $m$ has more MSTD sets than MDTS sets when $6\leq m\leq c_{j}\sqrt{n}$ for $c_{j}=1.3229/\sqrt{111+5j}$ , where $j$ is the number of elements in $G$ with order at most $2$ . We also analyze the expectation for $|A+A|$ and $|A-A|$ for $A\subseteq D_{2n}$ , proving an explicit formula for $|A-A|$ when $n$ is prime.

Key words and phrases:

More Sums Than Differences, Dihedral Group, Generalized Dihedral Group

2020 Mathematics Subject Classification:

11P99, 05B10

This work was supported by NSF grant DMS1947438, Williams College, and Harvey Mudd College.

1. Introduction and Main Results

Given a set of $A$ integers, the sumset and difference set of $A$ are defined as

A+A\ =\ \{a_{1}+a_{2}:a_{1},a_{2}\in A\}\quad\text{ and }\quad A-A\ =\ \{a_{1}-a_{2}:a_{1},a_{2}\in A\}.

(1)

These elementary operations are fundamental in additive number theory. A natural problem of recent interest has been to understand the relative sizes of the sum and difference sets of sets $A$ .

Definition 1.1.

We say that a set $A$ has more sums than differences (MSTD) if $|A+A|>|A-A|$ ; has more differences than sums (MDTS) if $|A+A|<|A-A|$ ; or is sum-difference balanced if $|A+A|=|A-A|$ .

We intuitively expect most sets to be MDTS since addition is commutative and subtraction is not. Nevertheless, MSTD subsets of integers exist. Nathanson detailed in [Nat07] the history of the problem, and attributed to John Conway the first recorded example of an MSTD subset of integers, $\{0,2,3,4,7,11,12,14\}$ . Martin and O’Bryant proved in [MO07] that the proportion of the $2^{n}$ subsets $A$ of $\{0,1,\ldots,n-1\}$ which are MSTD is bounded below by a positive value for all $n\geq 15$ . They proved this by controlling the “fringe” elements of $A$ , those close to $0$ and $n-1$ , which have the most influence over whether elements are missing from the sum and difference sets. In [Zha11], Zhao gave a deterministic algorithm to compute the limit of the ratio of MSTD subsets of $\{0,1,\ldots,n-1\}$ as $n$ goes to infinity and found that this ratio is at least $4.28\times 10^{-4}$ . For more on the problem of MSTD sets in the integers, see also [Heg07] and [Nat07a] for constructive examples of infinite families of MSTD sets, [MOS10] and [Zha10] for non-constructive proofs of existence of infinite families of MSTD sets, and [HM09] and [HM13] for an analysis of sets with each integer from $0$ to $n-1$ included with probability $cn^{-\delta}$ .

More recently, several authors have examined analogous problems for groups $G$ other than the integers. For example, Do, Kulkarni, Moon, Wellens, Wilcox, and the seventh author studied in [DKMMWW15] the analogous problem for higher-dimensional integer lattices.

For finite groups, although the usual notation for the operation of the group is multiplication, we match the notation from previous work and define, for a subset $A\subseteq G$ , its sumset and difference set as

A+A\ =\ \{a_{1}a_{2}:a_{1},a_{2}\in A\}\quad\text{ and }\quad A-A\ =\ \{a_{1}a_{2}^{-1}:a_{1},a_{2}\in A\}.

(2)

Definition 1.1 of MSTD, MDTS, and sum-difference balanced sets apply in this context.

The approaches used to study MSTD subsets of integers do not generalize for MSTD subsets of finite groups due to the lack of fringes. Zhao proved asymptotics for numbers of MSTD subsets of finite abelian groups as the size of the group goes to infinity in [Zha10a]. The seventh author and Vissuet examined the problem for arbitrary finite groups $G$ , also with the size of the group going to infinity, and proved Theorem 1.2.

Theorem 1.2 ([MV14]).

Let $\{G_{n}\}$ be a sequence of finite groups, not necessarily abelian, with $|G_{n}|\to\infty$ . Let $A_{n}$ be a uniformly chosen random subset of $G_{n}$ . Then $\mathbb{P}[A_{n}+A_{n}=A_{n}-A_{n}=G_{n}]\to 1$ as $n\to\infty$ . In other words, as the size of the finite groups increases without bound, almost all subsets are balanced (with sumset and difference set equalling the entire group).

Furthermore, for the case of dihedral groups $D_{2n}$ , they proposed Conjecture 1.3.

Conjecture 1.3 ([MV14]).

Let $n\geq 3$ be an integer. There are more MSTD subsets of $D_{2n}$ than MDTS subsets of $D_{2n}$ .

Given a set $A\subseteq D_{2n}=\langle r,s\mid r^{n},s^{2},rsrs\rangle$ , define $R$ (resp. $F$ ) as the set of elements of $A$ of the form $r^{i}$ (resp. $r^{i}s$ ), called rotation elements (resp. flip elements). Hence, $A=R\cup F$ . Then, we can write

	$\displaystyle A+A\$	$\displaystyle=\ (R+R)\cup(F+F)\cup(R+F)\cup(-R+F),$
	$\displaystyle A-A\$	$\displaystyle=\ (R-R)\cup(F+F)\cup(R+F).$		(3)

Intuition for Conjecture 1.3 comes from noting that $F+F$ and $R+F$ contribute to both $A+A$ and $A-A$ ; $R+R$ and $-R+F$ contribute only to $A+A$ ; and $R-R$ contributes only to $A-A$ .

In 2020, Haviland, Kim, Lâm, Lentfer, Trejos Suáres, and the seventh author made progress towards Conjecture 1.3 by partitioning subsets of $D_{2n}$ by size. They proposed Conjecture 1.4 as a means of proving Conjecture 1.3.

Conjecture 1.4 ([HKLLMT20]).

Let $n\geq 3$ be an integer, and let $\mathcal{S}_{2n,m}$ denote the collection of subsets of $D_{2n}$ of size $m$ . For any $m\leq 2n$ , $\mathcal{S}_{2n,m}$ has at least as many MSTD sets as MDTS sets.

They showed that Conjecture 1.4 holds for $m=2$ , which we reproduce in this paper, and we also extend their approach to $m=3$ . They also showed that Conjecture 1.4 holds for $m>n$ by showing that all sets in $\mathcal{S}_{2n,m}$ are sum-difference balanced. We prove this result in this paper, using Lemma 1.5. These results are proved in Section 2.

Lemma 1.5.

Let $n\geq 3$ be an integer, and let $A\subseteq D_{2n}$ . Let $R$ (resp. $F$ ) be the subset of rotations (resp. flips) in $A$ . Suppose that $|F|>\frac{n}{2}$ or $|R|>\frac{n}{2}$ . Then, $A$ cannot be MDTS.

Furthermore, we extend Conjecture 1.3 as follows. A generalized dihedral group is given by $D=\mathbb{Z}_{2}\ltimes G$ , where $G$ is any abelian group and where the nonidentity element of $\mathbb{Z}_{2}$ acts on $G$ by inversion.

Conjecture 1.6.

Let $G$ be an abelian group with at least one element of order $3$ or greater, and let $D=\mathbb{Z}_{2}\ltimes G$ be the corresponding generalized dihedral group. Then, there are more MSTD subsets of $D$ than MDTS subsets of $D$ .

Conjecture 1.3 is a special case of Conjecture 1.6, with $G=\mathbb{Z}_{n}$ . We also state Conjecture 1.7, analogous to Conjecture 1.4.

Conjecture 1.7.

Let $D$ be a generalized dihedral group of size $2n$ , and let $\mathcal{S}_{D,m}$ denote the collection of subsets of $D$ of size $m$ . For any $m\leq 2n$ , $\mathcal{S}_{D,m}$ has at least as many MSTD sets as MDTS sets.

The version of Lemma 1.5 that we prove in Section 2 deals with the generalized dihedral group.

In Section 3, we prove our main theorem, verifying Conjecture 1.7 for the case of $m\leq c_{j}\sqrt{n}$ , where $c_{j}$ is a constant (independent of $n$ ) depending only on the quantity $j$ , the number of elements of order at most $2$ in the abelian group $G$ . More explicitly, we show the following.

Theorem 1.8.

Let $D=\mathbb{Z}_{2}\ltimes G$ be a generalized dihedral group of size $2n$ . Let $\mathcal{S}_{D,m}$ denote the collection of subsets of $D$ of size $m$ , and let $j$ denote the number of elements in $G$ with order at most $2$ . If $6\leq m\leq c_{j}\sqrt{n}$ , where $c_{j}=1.3229/\sqrt{111+5j}$ , then there are more MSTD sets than MDTS sets in $\mathcal{S}_{D,m}$ .

See Section 3 the proof of this result and two related theorems. We also extend these results to the dihedral group on finitely generated abelian groups in Section 3.3.

Next, in Section 4, we discuss the following result about the expected size of $|A-A|$ when $A$ is a randomly chosen set in $\mathcal{S}_{2n,m}$ , the collection of subsets of $D_{2n}$ of size $m$ .

Theorem 1.9.

If $n$ is prime, and $A$ is chosen uniformly at random from $\mathcal{S}_{2n,m}$ , then

\mathbb{E}[|A-A|]\ =\ 2n-\frac{nm2^{m}{\binom{n}{m}}+2n(n-1){\binom{n-m-1}{m-1}}}{m{\binom{2n}{m}}}-\frac{n^{2}(n-1)}{{\binom{2n}{m}}}\sum_{k=1}^{m-1}\frac{{\binom{n+k-m-1}{m-k-1}}{\binom{n-k-1}{k-1}}}{k(m-k)}.

(4)

Finally, in Section 5, we discuss directions for further research.

2. Direct Analysis

2.1. Small Subsets

For the case of the usual dihedral group $D_{2n}$ , we have the following two results.

Lemma 2.1 ([HKLLMT20]).

Let $n\geq 3$ , and let $\mathcal{S}_{2n,2}$ denote the collection of subsets of $D_{2n}$ of size $2$ . Then, $\mathcal{S}_{2n,2}$ has strictly more MSTD sets than MDTS sets.

Lemma 2.2.

Let $n\geq 3$ , and let $\mathcal{S}_{2n,3}$ denote the collection of subsets of $D_{2n}$ of size $3$ . Then, $\mathcal{S}_{2n,3}$ has strictly more MSTD sets than MDTS sets.

The proofs for both these lemmas use basic and somewhat tedious casework; they can be found in Appendix A.

Similar results for the generalized dihedral group likely follow from similar arguments.

2.2. Large Subsets

We consider what happens when $m$ gets close to $n$ . Here we can prove a result for any generalized dihedral group. For the rest of this section, let $G$ be a finite abelian group of size $n$ . Recall that the generalized dihedral group is given by $D=\mathbb{Z}_{2}\ltimes G$ , where the nonidentity element of $\mathbb{Z}_{2}$ acts on $G$ by inversion. Note that $|D|=2n$ . Writing an element of the group $D$ as $(z,g)$ where $z\in\{0,1\}$ and $g\in G$ , we write $R_{D}$ to mean the subset of $D$ consisting of elements with $z=0$ and $F_{D}$ to mean the subset of $D$ consisting of elements with $z=1$ . Note that $|R_{D}|=|F_{D}|=n$ . For the case where $D=D_{2n}=\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}$ is the usual dihedral group, $R_{D}$ and $F_{D}$ are the sets of rotations and flips, respectively; out of convenience, we will use these terms for the general case as well.

It turns out that having $m>n$ ensures that $A$ is balanced.

Lemma 2.3.

Let $D$ be a generalized dihedral group of size $2n$ . Let $A\subseteq D$ , and let $R=A\cap R_{D}$ and $F=A\cap F_{D}$ . If $\max(|R|,|F|)>n/2$ , then $R_{D}\subseteq A+A$ and $R_{D}\subseteq A-A$ .

Proof.

Let $L$ be the larger of $R$ and $F$ , and define $L_{D}=R_{D}$ if $L=R$ and $L_{D}=F_{D}$ if $L=F$ .

For each rotation $r\in R_{D}$ , define $rL^{-1}=\{r\ell^{-1}\ |\ \ell\in L\}$ and $rL=\{r\ell\ |\ \ell\in L\}$ . Note that $|rL^{-1}|=|rL|=|L|>n/2$ , and $rL^{-1}$ , $rL$ , $L$ are subsets of the set $L_{D}$ which has size $n$ . Hence, by the inclusion–exclusion principle, $L\cap rL^{-1}$ and $L\cap rL$ are nonempty. Thus, $r\in L+L\subseteq A+A$ and $r\in L-L\subseteq A-A$ .

Therefore, as desired, $R_{D}\subseteq A+A$ and $R_{D}\subseteq A-A$ . ∎

Remark 1.

Note Lemma 2.3 implies if $\max(|R|,|F|)>n/2$ , then $A$ is not MDTS. This follows from the discussion after the statement of Conjecture 1.3 (which we explicitly extend to the general dihedral group case in Section 3). For a set to be MDTS, $R-R$ must contribute rotations to $A-A$ that the set $A+A$ does not have. But here we have shown that if $\max(|R|,|F|)>n/2$ , then $A+A$ has all the rotations.

Lemma 2.4.

Let $D$ be a generalized dihedral group of size $2n$ , and let $A\subseteq D$ with $|A|=m$ . If $m>n$ , then $A+A=A-A=D$ .

Proof.

Let $R$ (resp. $F$ ) be the subset of rotations (resp. flips) in $A$ ; hence $A=R\cup F$ . Let $L,S$ be the larger and smaller of $R$ and $F$ , respectively. Define $|L|=n_{1},|S|=n_{2}$ for $n_{1}+n_{2}=m>n$ . Thus, $n_{1}>n/2$ .

By Lemma 2.3, we have that $R_{D}\subseteq A+A$ and $R_{D}\subseteq A-A$ .

For each flip $f\in F_{D}$ , define $fL^{-1}=\{f\ell^{-1}\ |\ \ell\in L\}$ and $fL=\{f\ell\ |\ \ell\in L\}$ .

Note that $|fL^{-1}|=|fL|=|L|=n_{1}$ , $|S|=n_{2}$ , and $fL^{-1},fL,S$ are subsets of $S_{D}$ , which has size $n<n_{1}+n_{2}$ . Hence, by the inclusion–exclusion principle, $fL^{-1}\cup S$ and $fL\cup S$ are nonempty. Thus, $f\in S+L\subseteq A+A$ and $f\in S-L\subseteq A-A$ . Therefore, $F_{D}\subseteq A+A$ and $F_{D}\subseteq A-A$ .

Thus, we have $R_{D},F_{D}\subseteq A+A$ and $R_{D},F_{D}\subseteq A-A$ , which imply $A+A=A-A=D$ . ∎

3. Collision Analysis

Let $G$ be a finite abelian group of size $n$ , written multiplicatively. Recall that the generalized dihedral group is given by $D=\mathbb{Z}_{2}\ltimes G$ , where the nonidentity element of $\mathbb{Z}_{2}$ acts on $G$ by inversion.

This section is dedicated to proving the following.

See 1.8

Note that the theorem is only useful when $c_{j}\sqrt{n}\geq 6$ , or $n\geq(6/c_{j})^{2}$ .

If $n$ is arbitrarily large and $j$ is a constant compared to $n$ , we can make a stronger statement: we can replace $c_{j}$ in the above theorem with a constant arbitrarily close to $\sqrt{2/7}\approx 0.5345$ . Specifically, we have the following.

Theorem 3.1.

For fixed $j$ and $\epsilon>0$ , there exists $n_{j,\epsilon}$ with the following property. Let $G$ be an abelian group of size $n\geq n_{j,\epsilon}$ with at most $j$ elements of order $2$ or $1$ . Then with $D$ and $\mathcal{S}_{D,m}$ defined as in Theorem 1.8, we have that if $6\leq m\leq\left(\sqrt{2/7}-\epsilon\right)\sqrt{n}$ , then there are more MSTD sets than MDTS sets in $\mathcal{S}_{D,m}$ .

We can give a stronger, more general statement on the proportion of MSTD sets in $\mathcal{S}_{D,m}$ if $m$ is large and also bounded above by a (smaller) constant times $\sqrt{n}$ .

Theorem 3.2.

Let $D,j$ , and $\mathcal{S}_{D,m}$ be defined as in Theorem 1.8. For any $\epsilon>0$ , there exist $m_{\epsilon}$ and $c_{\epsilon,j}$ such that if $m_{\epsilon}\leq m\leq c_{\epsilon,j}\sqrt{n}$ , the proportion of MSTD sets in $\mathcal{S}_{D,m}$ is at least $1-\epsilon$ .

Here $m_{\epsilon}$ and $c_{\epsilon,j}$ are independent of $n$ , but similarly to before, this theorem is only useful when $n\geq(m_{\epsilon}/c_{\epsilon,j})^{2}$ .

Remark 2.

In practice, Theorems 1.8 and 3.2 are most useful if $j$ is essentially a constant compared to $n$ . This is indeed the case for the original dihedral group $D_{2n}=\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}$ , where $j=1$ when $n$ is odd and $j=2$ when $n$ is even, yielding $c_{j}\geq 0.12$ . The family of original dihedral groups is also a good example of how to use Theorem 3.1: we can apply that theorem with $j=2$ and $\epsilon$ arbitrarily small to get that for large enough dihedral groups, we can get the coefficient of the $\sqrt{n}$ in the theorem to be very close to $\sqrt{2/7}$ , which is a significant improvement over $0.12$ .

However, these theorems are not useful when, for example, $G=\mathbb{Z}_{2}\times\ldots\times\mathbb{Z}_{2}\times\mathbb{Z}_{3}$ . Here $G$ does have an element of order at least $3$ , so Conjecture 1.6 applies, but we have $j=n/3$ , which is too large for Theorems 1.8 and 3.2 to apply to any values of $m$ . In fact, if $j\geq n/100$ , then $n\leq(6/c_{j})^{2}$ , and Theorem 1.8 does not apply to any values of $m$ .

We first prove Theorem 1.8 here. Then, in Subection 3.1 we demonstrate Theorems 3.1 and 3.2.

Recall the definitions of $R_{D}$ and $F_{D}$ from Section 2.2. Note that any element in $F_{D}$ has order $2$ in $D$ . Furthermore, any element in $R_{D}$ has order at most $2$ in $D$ if and only if it has order at most $2$ in $G$ .

We begin with a set $A$ with size $m$ and count the number of elements in $A+A$ and $A-A$ . In this count, we will make a naive assumption: there are no overlaps between sums and differences that we do not expect to overlap. Decompose $A$ into the union of the set of rotations $R=A\cap R_{D}$ of and the set of flips $F=A\cap F_{D}$ , and define $k=|F|$ . We have:

	$\displaystyle A+A\$	$\displaystyle=\ (R+F)\cup(F+R)\cup(R+R)\cup(F+F);$
	$\displaystyle A-A\$	$\displaystyle=\ (R-F)\cup(F-R)\cup(R-R)\cup(F-F)$		(5)

Consider first the flips in $A+A$ and $A-A$ . In $A+A$ , these are in $R+F$ and $F+R$ . Note that we do not expect a lot of overlap in general; for a rotation $(0,g_{1})\in R$ and a flip $(1,g_{2})\in F$ , $(0,g_{1})\cdot(1,g_{2})=(1,g_{1}g_{2})$ does not equal $(1,g_{2})\cdot(0,g_{1})=(1,g_{2}g_{1}^{-1})$ unless $g_{1}$ has order $1$ or $2$ in $G$ . On the other hand, for the flips in $A-A$ , we have $R-F=F-R$ . This is because $(0,g_{1})\cdot(1,g_{2})^{-1}=(0,g_{1})\cdot(1,g_{2})=(1,g_{1}g_{2})$ , and $(1,g_{2})\cdot(0,g_{1})^{-1}=(1,g_{2})\cdot(0,g_{1}^{-1})=(1,g_{1}g_{2})$ , so these two are the same. There are $m-k$ rotations and $k$ flips in $A$ , so we thus expect the flips to contribute $2(m-k)k$ to $A+A$ but only $(m-k)k$ to $A-A$ .

Next, consider the rotations in $A+A$ and $A-A$ . Begin with $F+F$ and $F-F$ . Since all flips have order $2$ , these are in fact the same set and thus always contribute equally to $A+A$ and to $A-A$ . Also note that if $k\neq 0$ (we will treat the $k=0$ case later), the identity $1$ is contained in $F+F$ and $F-F$ .

Next consider $R+R$ and $R-R$ . Adding rotations is commutative, so we expect $R+R$ to contribute $\binom{m-k}{2}+(m-k)$ to the size of $A+A$ , where the $m-k$ term comes from the sum of each rotation in $R$ with itself. On the other hand, in general $g_{1}g_{2}^{-1}\neq g_{2}g_{1}^{-1}$ , so $R-R$ is expected to contribute $2\binom{m-k}{2}$ . Here there is no additional $m-k$ term since when $g_{1}=g_{2}$ , we have $g_{1}g_{2}^{-1}=1$ , and $1$ was already counted in $A-A$ from $F-F$ .

We now put this all together. For $|A+A|>|A-A|$ , we need

	$\displaystyle 2(m-k)k+\binom{m-k}{2}+(m-k)\ >\ (m-k)k+2\binom{m-k}{2},$
$\displaystyle\iff$	$\displaystyle k\ >\ m/3-1\text{ and }m\ \neq\ k,$
$\displaystyle\iff$	$\displaystyle m/3\ \leq\ k\ <\ m.$	(6)

Note that when $k=m$ the set is necessarily balanced as $F+F=F-F$ . Further, one can now see why we may assume $k\neq 0$ : when $k$ is smaller than $m/3$ , we expect the set to be MDTS, and indeed we will assume this is the case.

To use this naive estimate to prove our theorem, we first formalize our assumption that we have minimal overlaps within the sumset.

Definition 3.3.

Let $A\subseteq D$ such that $|A|=m$ , and let $q=(a,b,c,d)\in A^{4}$ . We say that $q$ represents a collision if $ab=cd$ .

Every collision that occurs in $A$ has the potential to make $|A+A|$ smaller relative to our naive estimate, unless the quadruple $q$ is of a form that we already took into account. For example, if $q=(a,b,a,b)$ , then this is not a collision we need to count as $ab=ab$ trivially. Similarly, collisions of the form $q=(a,b,b,a)$ with $a$ and $b$ both rotations do not subtract from $|A+A|$ in Equation (3) as we already accounted for commutativity of addition for rotations. And finally, if $q=(a,b,c,d)$ is a collision where $a$ , $b$ , $c$ , and $d$ are all flips, then $q$ does not impact Equation (3) as $F+F=F-F$ do not affect the relative sizes of the sum and difference sets. We refer to these three kinds of quadruples as redundant.

Every non-redundant collision $(a,b,c,d)$ of $A$ , together with $(c,d,a,b)$ , decreases the size of $A+A$ by at most $1$ from our naive estimate. Let $X_{A}$ be half the total number of non-redundant collisions of $A$ . Then combining the above analysis with Equation (3), we are guaranteed to have that $A$ is MSTD when

	$\displaystyle 2(m-k)k+\binom{m-k}{2}+(m-k)-X_{A}\ >\ (m-k)k+2\binom{m-k}{2},$
$\displaystyle\iff$	$\displaystyle m/3+\frac{2X_{A}}{3(m-k)}\ \leq\ k<m,$
$\displaystyle\iff$	$\displaystyle 3k^{2}-4mk+m^{2}+2X_{A}\ \leq\ 0\text{ (and $k\ <\ m$)}.$	(7)

We use the quadratic equation to solve for when the above quantity equals $0$ and obtain $k=(1/6)(4m\pm\sqrt{16m^{2}-12(m^{2}+2X_{A})})=(1/3)(2m\pm\sqrt{m^{2}-6X_{A}})$ . Thus Equation (7) is satisfied when:

\displaystyle\frac{2m-\sqrt{m^{2}-6X_{A}}}{3}\ \leq\ k\ \leq\ \frac{2m+\sqrt{m^{2}-6X_{A}}}{3}\ \ \text{ (and $k\ <\ m$)}.

(8)

Remark 3.

We are assuming that no “collisions” of the form $ab^{-1}=cd^{-1}$ happen to lower the size of $A-A$ . Because our objective is to guarantee $|A+A|>|A-A|$ for a large proportion of $A$ , this assumption still gives a sufficient condition on $X_{A}$ and $k$ .

One therefore sees that for most values of $k$ , when the number of collisions is not too large, $A$ is MSTD. More formally, suppose that $A$ is chosen randomly out of the subsets of $D$ with size $m$ . Suppose that the expectation value of $X_{A}$ is bounded above by $c_{1}m^{2}$ , where $c_{1}=7/1152\approx 0.006076$ . Then the actual value of $X_{A}$ exceeds $12c_{1}m^{2}$ at most $1/12$ of the time by Markov’s inequality. Of sets $A$ with $5m/12\leq k\leq 11m/12$ , the actual value of $X_{A}$ exceeds $12c_{1}m^{2}$ at most $2/12=1/6$ of the time. Thus when $5m/12\leq k\leq 11m/12$ , Equation (8) is true at least $5/6$ of the time since when $X_{A}=12c_{1}m^{2}$ , the equation reads

	$\displaystyle\frac{2m-\sqrt{m^{2}-72c_{1}m^{2}}}{3}\ \leq\ k\ \leq\ \frac{2m+\sqrt{m^{2}-72c_{1}m^{2}}}{3},$
$\displaystyle\iff$	$\displaystyle\frac{2m-\sqrt{9/16\cdot m^{2}}}{3}\ \leq\ k\ \leq\ \frac{2m+\sqrt{9/16\cdot m^{2}}}{3},$
$\displaystyle\iff$	$\displaystyle\frac{5m}{12}\ \leq\ k\ \leq\ \frac{11m}{12}.$	(9)

Now, we need to make sure that for our values of $m$ , more than $(1/2)/(5/6)=3/5$ proportion of sets in $\mathcal{S}_{D,m}$ have $5m/12\leq k\leq 11m/12$ . This will ensure that a proportion greater than $1/2$ of sets in $\mathcal{S}_{D,m}$ satisfy Equation (8) and are therefore MSTD.

More formally speaking, we require

		$\displaystyle\lim_{n\to\infty}\frac{\sum_{k=\left\lceil 5m/12\right\rceil}^{\left\lfloor 11m/12\right\rfloor}\binom{n}{k}\binom{n}{m-k}}{\binom{2n}{m}}\ >\ 3/5$		(10)
	$\displaystyle\iff$	$\displaystyle\frac{m!}{2^{m}}\sum_{k=\left\lceil 5m/12\right\rceil}^{\left\lfloor 11m/12\right\rfloor}\frac{1}{k!(m-k)!}\ >\ 3/5,$		(11)

where we may take the limit as $n\to\infty$ because the left hand side of Equation (10) decreases with increasing $n$ . Equation (11) can be verified numerically to be true for $m\geq 6$ .¹¹1In fact, when $m$ is large, we expect almost all of the sets in $\mathcal{S}_{D,m}$ to have $5m/12\leq k\leq 11m/12$ ; see Subection 3.1 for further discussion on this fact. For smaller values of $m\geq 6$ , one can verify Equations (10) and (11) using the following Desmos link: https://www.desmos.com/calculator/e4zqwbmmcr.

The problem has therefore been reduced to placing an upper bound on the values of $m$ such that the expected value of $X_{A}$ is at most $c_{1}m^{2}$ when $A$ is chosen uniformly at random from $\mathcal{S}_{D,m}$ . The following lemma gives us the result we need.

Lemma 3.4.

When $A$ is chosen uniformly at random from $\mathcal{S}_{D,m}$ , we have

\mathbb{E}[X_{A}]\ \leq\ \left(\frac{7}{32}+\frac{1}{m}+\frac{5j}{8m^{2}}\right)\frac{m^{4}}{n}.

(12)

Much of the machinery of this proof lies in Lemma 3.4; its proof is rather technical and can be found in Subsection 3.2.

When $m\geq 6$ , Lemma 3.4 implies that under the hypothesis of the lemma,

\mathbb{E}[X_{A}]\ \leq\ c_{2}\frac{m^{4}}{n},\quad\text{ where }c_{2}\ =\ \frac{7}{32}+\frac{1}{6}+\frac{5j}{288}\ =\ \frac{111+5j}{288}.

(13)

We are ready to complete the proof of Theorem 1.8. Recall that we wanted to have $\mathbb{E}[X_{A}]\leq c_{1}m^{2}$ to ensure most subsets of size $m$ would be MSTD. Thus the requisite upper bound on $m$ is determined as

	$\displaystyle c_{2}\frac{m^{4}}{n}\ \leq\ c_{1}m^{2},$
	$\displaystyle\iff m\ \leq\ \sqrt{\frac{c_{1}n}{c_{2}}}\ =\ c_{j}\sqrt{n},$		(14)

where $c_{j}=\sqrt{c_{1}/c_{2}}=\sqrt{7/(4(111+5j))}\approx 1.3229/\sqrt{5j+111}$ .

This concludes the proof of Theorem 1.8.

3.1. Proof of Theorems 3.1 and 3.2

In this section we prove Theorem 3.1, and in the process we outline the steps needed to prove Theorem 3.2. We now assume that $n>n_{j,\epsilon}$ , where $j$ is a constant upper bound on the number of elements of order at most 2 in the group $G$ and $n_{j,\epsilon}$ is sufficiently large.

Having proved Theorem 1.8 for $6\leq m\leq(1.3229/\sqrt{5j+111})\sqrt{n}$ , we may now assume $m\geq(1.3229/\sqrt{5j+111})\sqrt{n}$ ; that is, $m$ is now large. We will follow similar steps to the previous proof, but using this assumption, we will increase $c_{1}$ to be arbitrarily close to $1/16$ , and we will decrease $c_{2}$ to be arbitrarily close to $7/32$ . Then we will have that the coefficient of the $\sqrt{n}$ , which is $\sqrt{c_{1}/c_{2}}$ (as discussed in the previous proof), is arbitrarily close to $\sqrt{2/7}$ .

Take small $\epsilon_{1}>0$ . Note that inside of $\mathcal{S}_{D,m}$ , the distribution of values of $k$ is a hypergeometric distribution. This is because one can construct a random set in $\mathcal{S}_{D,m}$ by taking $m$ random elements of the group $D$ without replacement, one at a time; to begin with there is a $1/2$ chance each time that we choose a flip. Thus since $n$ is very large and $j$ is fixed, having $m\geq(1.3229/\sqrt{5j+111})\sqrt{n}$ is sufficient for a proportion at least $1-\epsilon_{1}$ of sets in $\mathcal{S}_{D,m}$ to have $k\in[(1/2-\epsilon_{1})m,(1/2+\epsilon_{1})m]$ .

Going back to Equation (8), we thus see that we just need $\sqrt{m^{2}-6X_{A}}$ to be at least $m/2+\epsilon_{1}$ , or $6X_{A}\leq m^{2}-(m/2+\epsilon_{1})^{2}$ , slightly more than half the time when $k$ is in the relevant interval. More specifically, we need

\mathbb{P}\left[6X_{A}\leq\frac{3m^{2}}{4}-m\epsilon_{1}-\epsilon_{1}^{2}\ \Big{|}\ (1/2-\epsilon_{1})m\leq k\leq(5/6+\epsilon_{1})m\right]\ \geq\ \frac{1}{2}\cdot\frac{1}{(1-\epsilon_{1})^{2}}.

(15)

Then, among sets with $(1/2-\epsilon_{1})m\leq k\leq(5/6+\epsilon_{1})m$ (which, recall, form a proportion of at least $1-\epsilon_{1}$ of sets in $\mathcal{S}_{D,m}$ ), at least a proportion of $(1/2)/(1-\epsilon_{1})^{2}$ satisfy Equation (8). This means that a proportion of at least $(1/2)/(1-\epsilon_{1})>1/2$ of sets in $\mathcal{S}_{D,m}$ are MSTD.

For Equation (15) to hold, we claim that it suffices to have the following probability bound, not conditioned on the size of $k$ :

\mathbb{P}\left[X_{A}\leq\frac{m^{2}}{8}-\frac{m\epsilon_{1}}{6}-\frac{\epsilon_{1}^{2}}{6}\right]\ \geq\ \frac{1}{2}\cdot\frac{1}{1-\epsilon_{1}}+\epsilon_{1}.

(16)

To see why Equation (16) implies Equation (15), call $B$ the event that $(1/2-\epsilon_{1})m\leq k\leq(5/6+\epsilon_{1})m$ and $C$ the event that $X_{A}\leq\frac{m^{2}}{8}-\frac{m\epsilon_{1}}{6}-\frac{\epsilon_{1}^{2}}{6}$ . Then, we manipulate conditional probabilities as follows.

	$\displaystyle\mathbb{P}[C]\ =\ \mathbb{P}\left[C\ \|\ B\right]\mathbb{P}[B]+\mathbb{P}\left[C\ \|\ \neg{B}\right]\mathbb{P}[\neg{B}]\ \leq\ \mathbb{P}\left[C\ \|\ B\right]\mathbb{P}[B]+\mathbb{P}[\neg B],$
	$\displaystyle\iff\mathbb{P}\left[C\ \|\ B\right]\ \geq\ \frac{\mathbb{P}[C]-\mathbb{P}[\neg B]}{\mathbb{P}[B]}\ =\ \frac{\mathbb{P}[C]-(1-\mathbb{P}[B])}{\mathbb{P}[B]}\ =\ 1-\frac{1-\mathbb{P}[C]}{\mathbb{P}[B]}.$		(17)

Since $\mathbb{P}[B]\geq 1-\epsilon_{1}$ and Equation (16) says that $\mathbb{P}[C]\geq(1/2)/(1-\epsilon_{1})+\epsilon_{1}$ , we have that if Equation (16) is true, then

\displaystyle\mathbb{P}[C\ |\ B]\ \geq\ 1-\frac{1-((1/2)/(1-\epsilon_{1})+\epsilon_{1})}{1-\epsilon_{1}}\ =\ 1-\frac{(1-\epsilon_{1})-(1/2)/(1-\epsilon_{1})}{1-\epsilon_{1}}\ =\ \frac{1/2}{(1-\epsilon_{1})^{2}},

(18)

and the claim is shown.

To ensure that Equation (16) is true, we require

\mathbb{E}[X_{A}]\ \leq\ \left(1-\epsilon_{1}-\frac{1/2}{1-\epsilon_{1}}\right)\left(\frac{m^{2}}{8}-\frac{m\epsilon_{1}}{6}-\frac{\epsilon_{1}^{2}}{6}\right).

(19)

Then by Markov’s inequality, the probability that $X_{A}$ exceeds $\frac{m^{2}}{8}-\frac{m\epsilon_{1}}{6}-\frac{\epsilon_{1}^{2}}{6}$ is at most $1-\epsilon_{1}-\frac{1/2}{1-\epsilon_{1}}$ , which is equivalent to Equation (16).

We may now choose a small value $\epsilon_{2}$ such that Equation (19) is true if

\mathbb{E}[X_{A}]\ \leq\ \left(\frac{1}{16}-\epsilon_{2}\right)m^{2}.

(20)

Notice that in the limit $\epsilon_{1}\to 0$ , Equation (19) boils down to the statement that $\mathbb{E}[X_{A}]\leq m^{2}/16$ , so we can make $\epsilon_{2}$ be as small as desired by making $\epsilon_{1}$ be small.

We now revisit Lemma 3.4. We are now assuming that $m$ is large and $j$ is a constant, so only the first term dominates:

\mathbb{E}[X_{A}]\ \leq\ \left(\frac{7}{32}+\epsilon_{3}\right)\frac{m^{4}}{n}.

(21)

We wanted to have $\mathbb{E}[X_{A}]\leq(1/16-\epsilon_{2})m^{2}$ . Thus the upper bound on $m$ is now determined as

	$\displaystyle\left(\frac{7}{32}+\epsilon_{3}\right)\frac{m^{4}}{n}\ \leq\ \left(\frac{1}{16}-\epsilon_{2}\right)m^{2}$
	$\displaystyle\iff m\ \leq\ c\sqrt{n},$		(22)

where

\displaystyle c\ =\ \sqrt{\frac{1/16-\epsilon^{\prime}}{7/32+\epsilon_{3}}}.

(23)

If $n$ is arbitrarily large and $j$ is constant compared to $n$ , we can choose $\epsilon_{2}$ and $\epsilon_{3}$ to be very small, so that $c$ is arbitrarily close to $\sqrt{(1/16)/(7/32)}\approx 0.5345$ .

This completes the proof of Theorem 3.1. To prove Theorem 3.2, we follow a very similar method. We choose $m_{\epsilon}$ large enough that almost all of the sets in $\mathcal{S}_{D,m}$ have $(0.5-\epsilon_{1})m\leq k\leq(0.5+\epsilon_{1})m$ . The only difference is that now, in Equation (15) we replace the right-hand side with $(1-\epsilon_{1})$ , so that the proportion of MSTD sets is at least $(1-\epsilon_{1})^{2}$ . (We may choose $\epsilon_{1}$ so that $(1-\epsilon_{1})^{2}$ equals the desired $(1-\epsilon)$ ). Then in Equation (16) we replace the right-hand side with $1-(1-\epsilon_{1})\epsilon_{1}$ , leading to the analog of Equation (20) being that $\mathbb{E}[X_{A}]\leq\epsilon_{2}m^{2}$ for some small $\epsilon_{2}$ depending on $\epsilon_{1}$ . The rest of the proof continues as before, leading to a coefficient $c_{\epsilon,j}$ proportional to $\sqrt{\epsilon_{2}}$ . This completes the proof.

3.2. Proof of Lemma 3.4

We now prove Lemma 3.4. Recall that $X_{A}$ is defined to be half the number of non-redundant collisions in the set $A$ , and we are interested in bounding above the expectation value of $X_{A}$ when $A$ is chosen uniformly at random from $\mathcal{S}_{D,m}$ .

To more easily count the collisions in $A$ , we make the following definition.

Definition 3.5.

A redundant triple is a triple $(a,b,c)\in D^{3}$ such that the quadruple $(a,b,c,c^{-1}ab)$ is redundant. That is, a triple $(a,b,c)$ is redundant if $a=c$ , or if $a,b,$ and $c$ are all flips, or if $b=c$ and $a$ and $b$ are both rotations. Denote $T\subseteq D^{3}$ to be the set of non-redundant triples.

Define the function $\chi:\mathcal{S}_{D,m}\times T\to\{0,1\}$ by $\chi(A,t)=1$ if for the non-redundant triple $t=(a,b,c)$ , the element $c^{-1}ab$ is in $A$ , and $\chi(A,t)=0$ otherwise.

For a fixed set $A$ , the set $A^{3}\cap T$ is the set of non-redundant triples with all three elements contained in $A$ . Notice that we have

X_{A}\ =\ \frac{1}{2}\sum_{t\in A^{3}\cap T}\chi(A,t).

(24)

That is, the number of non-redundant collisions in $A$ is the same as the number of non-redundant triples $(a,b,c)\in A^{3}\cap T$ such that the element $d=c^{-1}ab$ is in $A$ , forming a quadruple $(a,b,c,d)$ representing a collision $ab=cd$ .

By definition of expectation value, we write

\mathbb{E}[X_{A}]\ =\ \sum_{A\in\mathcal{S}_{D,m}}\mathbb{P}[A]X_{A}.

(25)

Since $A$ is chosen uniformly at random from the $\binom{2n}{m}$ sets in $\mathcal{S}_{D,m}$ , we have $\mathbb{P}[A]=1/\binom{2n}{m}$ . Thus,

\mathbb{E}[X_{A}]\ =\ \frac{1}{\binom{2n}{m}}\sum_{A\in\mathcal{S}_{D,m}}\frac{1}{2}\sum_{t\in A^{3}\cap T}\chi(A,t).

(26)

We swap the order of the sums.

\mathbb{E}[X_{A}]\ =\ \frac{1}{2}\frac{1}{\binom{2n}{m}}\sum_{t\in T}\sum_{\begin{subarray}{c}A\in\mathcal{S}_{D,m}\\ A^{3}\ni t\end{subarray}}\chi(A,t).

(27)

To compute the inner sum, we must simply count the number of sets $A\in\mathcal{S}_{D,m}$ with $t=(a,b,c)\in A^{3}$ such that $A$ contains $c^{-1}ab$ . That is,

\mathbb{E}[X_{A}]\ =\ \frac{1}{2}\frac{1}{\binom{2n}{m}}\sum_{(a,b,c)\in T}|\{A\in\mathcal{S}_{D,m}\ |\ a,b,c,c^{-1}ab\in A\}|.

(28)

We now break this sum into seven pieces for different kinds of triples $(a,b,c)\in T$ . These are:

•

$T_{1}\ =\ \{(a,b,c)\in T\ |\ a,b,c,c^{-1}ab\text{ are distinct }\}$
•

$T_{2}\ =\ \{(a,b,c)\in T\ |\ a=b;\ a,c,c^{-1}a^{2}\text{ are distinct}\}$
•

$T_{3}\ =\ \{(a,b,c)\in T\ |\ b=c;\ a,c,c^{-1}ac\text{ are distinct}\}$
•

$T_{4}\ =\ \{(a,b,c)\in T\ |\ c^{-1}ab=a;\ a,b,c\text{ are distinct}\}$
•

$T_{5}\ =\ \{(a,b,c)\in T\ |\ c^{-1}ab=c;\ a,b,c\text{ are distinct}\}$
•

$T_{6}\ =\ \{(a,b,c)\in T\ |\ b=c;a=c^{-1}ac;\ a,c\text{ are distinct}\}$
•

$T_{7}\ =\ \{(a,b,c)\in T\ |\ a=b;c=c^{-1}a^{2};\ a,c\text{ are distinct}\}$

We have $T=\bigcup_{i=1}^{7}T_{i}$ , for these seven cases cover all the cases of possible equalities between the four elements except for those where $a=c$ , or equivalently, $c^{-1}ab=b$ , since those cases are redundant triples. Furthermore, this union is disjoint.

Note that for triples $(a,b,c)\in T_{1}$ , the quantity $|\{A\in\mathcal{S}_{D,m}\ |\ a,b,c,c^{-1}ab\in A\}|$ is given by $\binom{2n-4}{m-4}$ since we are requiring four distinct elements to be in $A$ , and we have $2n-4$ choices for the remaining $m-4$ elements. For triples in $T_{2},T_{3},T_{4},$ and $T_{5}$ , we are requiring three distinct elements to be in $A$ , so we have $|\{A\in\mathcal{S}_{D,m}\ |\ a,b,c,c^{-1}ab\in A\}|=\binom{2n-3}{m-3}$ . Finally, for triples in $T_{6}$ and $T_{7}$ we have $|\{A\in\mathcal{S}_{D,m}\ |\ a,b,c,c^{-1}ab\in A\}|=\binom{2n-2}{m-2}$ .

Therefore, from Equation (28), we may write:

	$\displaystyle\mathbb{E}[X_{A}]\$	$\displaystyle=\ \frac{1}{2}\frac{1}{\binom{2n}{m}}\left[\binom{2n-4}{m-4}\|T_{1}\|+\binom{2n-3}{m-3}\left(\|T_{2}\|+\|T_{3}\|+\|T_{4}\|+\|T_{5}\|\right)+\binom{2n-2}{m-2}\left(\|T_{6}\|+\|T_{7}\|\right)\right]$
		$\displaystyle\leq\ \frac{1}{2}\left[\left(\frac{m}{2n}\right)^{4}\|T_{1}\|+\left(\frac{m}{2n}\right)^{3}\left(\|T_{2}\|+\|T_{3}\|+\|T_{4}\|+\|T_{5}\|\right)+\left(\frac{m}{2n}\right)^{2}\left(\|T_{6}\|+\|T_{7}\|\right)\right],$		(29)

where in the second line we used the fact that

\frac{\binom{2n-4}{m-4}}{\binom{2n}{m}}\ =\ \frac{m(m-1)(m-2)(m-3)}{2n(2n-1)(2n-2)(2n-3)}\ \leq\ \left(\frac{m}{2n}\right)^{4},

(30)

and similarly for the other two terms.

Now, to use Equation (29) to find an upper bound on $\mathbb{E}[X_{A}]$ , all that remains to be done is find an upper bound on each of the $|T_{i}|$ ’s. We do so next.

We bound $|T_{1}|$ using the trivial inequality $|T_{1}|\leq|T|$ . There are $(2n)^{3}$ total triples in $D^{3}$ , but we may subtract the redundant triples, including the $n^{3}$ triples consisting of three flips. Thus we obtain

|T_{1}|\ \leq\ 7n^{3}.

(31)

We bound $|T_{2}|$ and $|T_{3}|$ next. We have $2n$ choices for $b$ . In $T_{2}$ , $a$ must equal $b$ , and in $T_{3}$ , $c$ must equal $b$ , and in both, we have at most $2n$ choices for the remaining value of $a$ or $c$ . The extra condition that $c^{-1}ab$ is distinct from the others only lowers $|T_{2}|$ and $|T_{3}|$ , so we do not have to take it into account to obtain an upper bound. Thus,

|T_{2}|,\ |T_{3}|\ \leq\ 4n^{2}.

(32)

For $|T_{4}|$ and $|T_{5}|$ , we have $2n$ choices for $a$ and $2n$ choices for $c$ , but $b$ must be the element $a^{-1}ca$ for $T_{4}$ or $a^{-1}c^{2}$ for $T_{5}$ . Thus we have

|T_{4}|,\ |T_{5}|\ \leq\ 4n^{2}.

(33)

When considering $T_{6}$ , we note that since $b=c$ , the triple $(a,b,c)$ is redundant if $a$ and $c$ are both rotations, or if both are flips. So, one must be a rotation and the other must be a flip, and we require that $a=c^{-1}ac$ , or $ca=ac$ . Thus to bound $|T_{6}|$ we must count the number of pairs of elements with $(0,g_{1})\cdot(1,g_{2})=(1,g_{2})\cdot(0,g_{1})$ , that is, $(1,g_{1}g_{2})=(1,g_{2}g_{1}^{-1})$ . This happens if and only if $g_{1}^{-1}=g_{1}$ , or $g_{1}^{2}=1$ . Recalling that there are $j$ elements in $G$ with order $2$ or less, there are therefore $j$ choices for the rotation element, and $n$ choices for the flip element. We multiply by $2$ since $a$ can be either the flip or the rotation, and $c$ is the other of the two. Thus,

|T_{6}|\ \leq\ 2nj.

(34)

Finally, for $T_{7}$ , we again split into two cases: firstly where $a$ is a rotation, and secondly where $a$ is a flip, so $c$ is a rotation to avoid redundancy. Here we require $c=c^{-1}a^{2}$ , or $c^{2}=a^{2}$ . For the first case, we first consider the number of pairs with $a^{2}=c^{2}=1$ . Since $a$ must be a rotation, there are only $j$ choices for $a^{2}=1$ ; $c$ can be a flip or a rotation, so there are $n+j$ ways to have $c^{2}=1$ . So, there are at most $(n+j)j$ pairs of this kind. If $a$ is a rotation and $a^{2}\neq 1$ , then $c$ must also be a rotation since otherwise $c^{2}=1\neq a^{2}$ . Thus $a$ and $c$ commute, so $a^{2}=c^{2}\iff(ac^{-1})^{2}=1$ . There are $n-j$ choices of $a$ with $a^{2}\neq 1$ , and for each, there are $j$ choices of $c$ which have $(ac^{-1})^{2}=1$ . Thus there are at most $(n-j)j$ pairs of this kind.

Next we consider the case where $a$ is a flip and $c$ is a rotation. Here $a^{2}=1$ , so there are only $j$ choices for $c$ that have $c^{2}=a^{2}$ . Thus there are $nj$ pairs of this kind. In total,

|T_{7}|\ \leq\ (n+j)j+(n-j)j+nj\ =\ 3nj.

(35)

We now substitute these bounds on each $|T_{i}|$ into Equation (29).

	$\displaystyle\mathbb{E}[X_{A}]\$	$\displaystyle\leq\ \frac{1}{2}\left[\left(\frac{m}{2n}\right)^{4}\left(7n^{3}\right)+\left(\frac{m}{2n}\right)^{3}\left(4n^{2}+4n^{2}+4n^{2}+4n^{2}\right)+\left(\frac{m}{2n}\right)^{2}\left(2nj+3nj\right)\right]$
		$\displaystyle=\ \left(\frac{7}{32}+\frac{1}{m}+\frac{5j}{8m^{2}}\right)\frac{m^{4}}{n}.$		(36)

This concludes the proof of Lemma 3.4.

3.3. Finitely Generated Abelian Groups

We transition to a discussion of finitely generated abelian groups $G$ . When $|G|<\infty$ Theorems 1.8, 3.1, and 3.2 hold, so the remaining case is when $G$ is an infinite group. However, we must make some restriction as to ensure taking subsets uniformly at random is well defined. By the fundamental theorem of finitely generated abelian groups,

G\cong\mathbb{Z}^{r_{0}}\oplus\mathbb{Z}_{q_{1}}^{r_{1}}\oplus\cdots\oplus\mathbb{Z}_{q_{k}}^{r_{k}},

(37)

where $q_{i}$ are powers of (not necessarily distinct) primes. We denote elements of $G$ as a tuple $(g_{0,1},g_{0,2},\dots,g_{0,r_{0}},g_{1,1},\dots,g_{k,r_{k}})\in G$ , where $g_{0,b}\in\mathbb{Z}$ and $g_{a,b}\in\mathbb{Z}_{q_{a}}$ for $a>0$ . Since this section will occasionally require us to deal with multiple groups simultaneously, we update our notation of $D$ as the generalized dihedral group of $G$ to the standard $\mathrm{Dih}(G)$ . We still denote elements of $\mathrm{Dih}(G)$ as $(z,g)$ , where $z\in\{0,1\},g\in G$ .

For some fixed $\alpha\in\mathbb{N}$ , we will consider taking subsets uniformly at random from the finite

\mathrm{Dih}(G_{\alpha})=\{(z,(g_{0,1},\dots,g_{0,r_{0}},\dots,g_{k,r_{k}}))\in\mathrm{Dih}(G)\ |\ 0\leq g_{0,b}<\alpha\}.

(38)

Our goal is to leverage Theorems 1.8, 3.1, and 3.2, and, to do so, we will refer to $\mathrm{Dih}(G^{\prime})$ , where

G^{\prime}=\mathbb{Z}_{\alpha}^{r_{0}}\oplus\mathbb{Z}_{q_{1}}^{r_{1}}\oplus\cdots\oplus\mathbb{Z}_{q_{k}}^{r_{k}}.

(39)

To adhere to prior notation, let $j$ be the number of elements in $G^{\prime}$ that are at most order $2$ and let $\mathcal{S}_{m}$ denote the collection of subsets of $\mathrm{Dih}(G_{\alpha})$ that have size $m$ . Then we get the following corollaries of Theorems 1.8, 3.1, and 3.2:

Corollary 3.6.

If $6\leq m\leq c_{j}\sqrt{n}$ , then there are more MSTD than MDTS in $\mathcal{S}_{m}$ .

Corollary 3.7.

If $n\geq n_{j,\epsilon}$ , then if $6\leq m\leq\left(\sqrt{2/7}-\epsilon\right)\sqrt{n}$ , then there are more MSTD than MDTS sets in $\mathcal{S}_{m}$ .

Corollary 3.8.

For any $\epsilon>0$ , there exist $m_{\epsilon}$ and $c_{\epsilon,j}$ such that if $m_{\epsilon}\leq m\leq c_{\epsilon,j}\sqrt{n}$ , the proportion of MSTD sets in $\mathcal{S}_{m}$ is at least $1-\epsilon$ .

Proof.

We will establish a bijection $\phi:\mathrm{Dih}(G_{\alpha})\rightarrow\mathrm{Dih}(G^{\prime})$ such that, if $(a,b,c,d)\in(\mathrm{Dih}(G_{\alpha}))^{4}$ is a collision, then $(\phi(a),\phi(b),\phi(c),\phi(d))\in(\mathrm{Dih}(G^{\prime}))^{4}$ is also a collision. Since we are still working with generalized dihedral groups, this immediately implies Corollaries 3.6, 3.7, and 3.8, as the number of non-degenerate collisions in $\mathrm{Dih}(G_{\alpha})$ is bounded above by the number of non-degenerate collisions in $\mathrm{Dih}(G^{\prime})$ .

Let $\phi:\mathrm{Dih}(G_{\alpha})\rightarrow\mathrm{Dih}(G^{\prime})$ defined by $\phi((z,(g_{0,1},\dots,g_{k,r_{k}})))=(z,(g_{0,1},\dots,g_{k,r_{k}})).$ As defined, this is clearly a bijection. Let $(x_{1},x_{2},x_{3},x_{4})\in(\mathrm{Dih}(G_{\alpha}))^{4}$ be a collision. By definition, $x_{1}x_{2}=x_{3}x_{4}$ . Let $x_{j}=(z_{j},(g_{0,1,j},\dots,g_{k,r_{k},j})$ . Define the binary operation $\star_{t}:\mathbb{Z}_{t}\times\mathbb{Z}_{t}\rightarrow\mathbb{Z}_{t}$ by

g_{1}\star g_{2}=\begin{cases}g_{1}+g_{2}\mod t,\ z_{1}=0,\\ g_{1}-g_{2}\mod t,\ z_{1}=1.\end{cases}

(40)

Then we get

x_{1}x_{2}=(z_{1}+z_{2}\text{ mod }2,(g_{0,1,1}\star_{n}g_{0,1,2},\dots,g_{k,r_{k},1}\star_{q_{k}}g_{k,r_{k},2})).

(41)

In other words, we are given the following system of equations:

	$\displaystyle z_{1}+z_{2}\equiv z_{3}+z_{4}\mod 2,$
	$\displaystyle g_{0,1,1}+g_{0,1,2}=g_{0,1,3}+g_{0,1,4},$
	$\displaystyle\vdots$
	$\displaystyle g_{k,r_{k},1}\star_{q_{k}}g_{k,r_{k},2}=g_{k,r_{k},3}\star_{q_{k}}g_{k,r_{k},4}.$

Consider $q=(\phi(x_{1}),\phi(x_{2}),\phi(x_{3}),\phi(x_{4}))$ . For $q$ to be a collision in $\mathrm{Dih}(G^{\prime})$ , we require the following system of equations:

	$\displaystyle z_{1}+z_{2}\equiv z_{3}+z_{4}\mod 2,$
	$\displaystyle g_{0,1,1}\star_{\alpha}g_{0,1,2}=g_{0,1,3}\star_{\alpha}g_{0,1,4},$
	$\displaystyle\vdots$
	$\displaystyle g_{k,r_{k},1}\star_{q_{k}}g_{k,r_{k},2}=g_{k,r_{k},3}\star_{q_{k}}g_{k,r_{k},4},$

which is implied by our given system. Therefore the result follows. ∎

We present an example of the number of collisions in another dihedral group $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ . We show that the dihedral group $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ possesses exactly the same number of possible collisions as the dihedral group $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n^{2}}$ if and only if $n$ is odd and there are more collisions in $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ otherwise. This result only provides some intuition on the the expected number of collisions depending on $j$ , the number of elements of order $2$ within the particular abelian group.

Lemma 3.9.

The number of possible collisions within the two groups are equal if and only if $n$ is odd and there are more collisions in $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ otherwise.

To do so, we make use of the following useful results:

Lemma 3.10.

The number of pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}-b_{1}\equiv x\mod n$ and $a_{2}-b_{2}\equiv y\mod n$ and the number of pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})-(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$ are both equal to $n^{2}$

Proof.

For any given $(b_{1},b_{2})$ and $b_{1}n+b_{2}$ , there exist only a single $(a_{1},a_{2})$ and $a_{1}n+a_{2}$ which satisfies the conditions respectively. Thus, there are $n^{2}$ such pairs for both cases. ∎

Lemma 3.11.

The number of pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}+b_{1}\equiv x\mod n$ and $a_{2}+b_{2}\equiv y\mod n$ is as follows:

•

when $n$ is odd, there are $\frac{n^{2}+1}{2}$ such pairs
•

when $n$ is even and $y$ is odd or $x$ is odd, there are $\frac{n^{2}}{2}$ such pairs
•

when $n$ , $y$ , and $x$ are all even, there are $\frac{n^{2}+4}{2}$ such pairs

Proof.

When $n$ is odd, each choice of $a_{1}$ is paired with a single possible $b_{1}$ , with one such pair being $a_{1}=b_{1}$ . Similarly, the same holds for $a_{2}$ and $b_{2}$ . This gives $n^{2}$ as the number of pairs. However, we over-counted since swapping $a_{1}$ with $b_{1}$ and $a_{2}$ with $b_{2}$ does not yield a distinct pair. Thus, we divide by $2$ except for the single pair where $a_{1}=b_{1}$ and $a_{2}=b_{2}$ which we did not over-count to get $\frac{n^{2}-1}{2}+1=\frac{n^{2}+1}{2}$ pairs of $(a_{1},a_{2}),(b_{1},b_{2})$ .

When $n$ is even but $y$ is odd, we choose pairs $(a_{2},b_{2})$ first which gives $n/2$ distinct pairs where $a_{2}$ is never equal to $b_{2}$ . For each of these pairs, we can choose any choice of $a_{1}$ which forces $b_{1}$ without over-counting. Thus, we get $\frac{n^{2}}{2}$ pairs. Similar arguments hold for when $x$ is odd.

When all of $n,y,$ and $x$ are even, we get that there are two pairs of identical number and $\frac{n-2}{2}$ pairs of different numbers which sum to $x$ and similar for $y$ in $\mathbb{Z}_{n}$ . We choose $a_{2},b_{2}$ first. If $a_{2}=b_{2}$ , we have $\frac{n-2}{2}+2=\frac{n+2}{2}$ choices for choosing $a_{1},b_{1}$ . If $a_{2}\neq b_{2}$ , we have that for any $a_{1}$ , we uniquely determines $b_{1}$ without over-counting and thus there are $n$ choices. In total, we have $2*\frac{n+2}{2}+n*\frac{n-2}{2}=\frac{n^{2}+4}{2}$ pairs of $(a_{1},a_{2}),(b_{1},b_{2})$ . ∎

Lemma 3.12.

The number of pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})+(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$ is as follows:

•

when $n$ is odd, there are $\frac{n^{2}+1}{2}$ such pairs
•

when $n$ is even but $y$ is odd, there are $\frac{n^{2}}{2}$ such pairs
•

when both $n$ and $y$ are even, there are $\frac{n^{2}+2}{2}$ such pairs

Proof.

When $n$ is odd, we have that each of $a_{1}n+a_{2}$ is paired with another $b_{1}n+b_{2}$ , with exactly one being paired with itself. Thus, there are $\frac{n^{2}-1}{2}+1=\frac{n^{2}+1}{2}$ pairs of $a_{1}n+a_{2}$ and $b_{1}n+b_{2}$ .

When $n$ is even but $y$ is odd, we have that each of $a_{1}n+a_{2}$ is paired with another $b_{1}n+b_{2}$ , necessarily distinct. Thus, there are $\frac{n^{2}}{2}$ pairs of $a_{1}n+a_{2}$ and $b_{1}n+b_{2}$ .

When $n$ is even and $y$ is even, we have that each of $a_{1}n+a_{2}$ is paired with another $b_{1}n+b_{2}$ , with exactly two being paired with themselves. Thus, there are $\frac{n^{2}-2}{2}+2=\frac{n^{2}+2}{2}$ pairs of $a_{1}n+a_{2}$ and $b_{1}n+b_{2}$ . ∎

Combining these results over the casework where elements of our pairs may be a rotation or a flip yield the desired result, with the details of the proof in Appendix B.

4. Expected Size of Sum and Difference Sets

In this section, we only consider the classical dihedral groups

D_{2n}=\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}=\langle r,s\mid r^{n},s^{2},rsrs\rangle

(42)

Throughout this section, we use $\mathcal{S}_{2n,m}$ to denote the set of subsets of size $m$ in $D_{2n}$ .

The method of collision analysis will likely not be sufficient to prove that $\mathcal{S}_{2n,m}$ has more MSTD sets than MDTS sets for values of $m$ greater in order of magnitude than $\sqrt{n}$ . The intuition for this comes from the fact that the sum and difference sets for $A\subseteq D_{2n}$ should very roughly have size on the same order of magnitude as $A^{2}$ . Hence, one would expect to usually have $A+A=A-A=D_{2n}$ when $m$ is much greater than $\sqrt{n}$ . The analysis for relative numbers of MSTD and MDTS sets in $\mathcal{S}_{2n,m}$ for these larger values of $m$ should therefore be based on counting the number of missed sums and differences in $D_{2n}$ , in direct analogy with the case of slow decay for the integers in [HM09].

We take the first steps toward such an analysis by proving the following special case.

See 1.9

This follows from the following straightforward yet useful lemma, which reduces the problem of computing the probability of missing a sum or difference to an analogous problem in $\mathbb{Z}_{n}$ .

Lemma 4.1.

Let $A$ be a subset in $\mathcal{S}_{2n,m}$ chosen uniformly at random. Then if $r^{i}$ is a rotation in $D_{2n}$ , we have

\mathbb{P}[r^{i}\notin A+A]\ =\ \sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S+S|\text{$S\subseteq\mathbb{Z}_{n}$, $|S|=m-k$}]\\ \cdot\ \mathbb{P}[i\notin S-S|\text{$S\subseteq\mathbb{Z}_{n}$, $|S|=k$}],\end{matrix}

(43)

and

\mathbb{P}[r^{i}\notin A-A]\ =\ \sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S-S|\text{$S\subseteq\mathbb{Z}_{n}$, $|S|=m-k$}]\\ \cdot\ \mathbb{P}[i\notin S-S|\text{$S\subseteq\mathbb{Z}_{n}$, $|S|=k$}].\end{matrix}

(44)

If $r^{i}s$ is a flip in $D_{2n}$ , we have

\mathbb{P}[r^{i}s\notin A+A]\ =\ \sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S_{1}+S_{2}\land i\notin S_{2}-S_{1}|\text{$S_{1},S_{2}\subseteq\mathbb{Z}_{n}$, $|S_{1}|=m-k$, $|S_{2}|=k$}],\end{matrix}

(45)

and

\mathbb{P}[r^{i}s\notin A-A]\ =\ \sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S_{1}+S_{2}|\text{$S_{1},S_{2}\subseteq\mathbb{Z}_{n}$, $|S_{1}|=m-k$, $|S_{2}|=k$}].\end{matrix}

(46)

Proof.

Partition $A$ into its set of rotations $R$ and flips $F$ , and suppose that $|F|=k$ and $|R|=m-k$ . The rotation element $r^{i}$ can appear in $A+A$ precisely as $r^{j}r^{\ell}=r^{j+\ell}$ for $r^{j},r^{\ell}\in R$ or as $(r^{j}s)(r^{\ell}s)=r^{j-\ell}$ for $r^{j}s,r^{\ell}s\in F$ . Taking the probability of the negations of these events respectively give second and third probabilities appearing in the sum of Equation (43). Equation (44) follows similarly.

The flip element $r^{i}s$ can appear in $A+A$ precisely as $r^{j}(r^{\ell}s)=r^{j+\ell}s$ or as $(r^{\ell}s)r^{j}=r^{\ell-j}s$ for $r^{j}\in R$ , $r^{\ell}s\in F$ , but unlike in the previous cases, these events are no longer independent so Equation (45) cannot be broken up into a product of simpler probabilities. Finally, $r^{i}s$ can appear in $A-A$ precisely $r^{j}(r^{\ell}s)^{-1}=r^{j+\ell}s=(r^{\ell}s)(r^{j})^{-1}$ , from which Equation (46) follows. ∎

A number of these probabilities can be expressed explicitly in terms of $n$ , $m$ , and $k$ . To prove Theorem 1.9, we need to compute all probabilities appearing in Equation (44) and Equation (46), but we will also compute the probabilities appearing in Equation (43) for completeness.

Lemma 4.2.

Let $S$ be a subset of $\mathbb{Z}_{n}$ of size $m-k$ chosen uniformly at random, and let $i$ be any element of $\mathbb{Z}_{n}$ . Then

\mathbb{P}[i\notin S+S]\ =\ \begin{cases}\frac{2^{m-k}{\binom{\frac{n}{2}-1}{k}}}{{\binom{n}{m-k}}},\ \text{$n$ and $i$ both even,}\\ \frac{2^{m-k}{\binom{\frac{n}{2}}{m-k}}}{{\binom{n}{m-k}}},\ \text{$n$ even and $i$ odd,}\\ \frac{2^{m-k}{\binom{\frac{n-1}{2}}{m-k}}}{{\binom{n}{m-k}}},\ \text{$n$ odd.}\end{cases}

(47)

Proof.

If $n$ and $i$ are both even, then there exist exactly $2$ elements of $\mathbb{Z}_{n}$ that give $i$ when added to themselves. The remaining elements of $\mathbb{Z}_{n}$ partition into pairs of distinct elements adding to $i$ , and any $S$ such that $i\notin S+S$ is obtained by selecting $m-k$ of these $\frac{n}{2}-1$ pairs and one element from each pair, hence the result. The case where $n$ is even and $i$ is odd is identical except that all $n$ of the elements of $\mathbb{Z}_{n}$ are now partitioned into pairs as there are no elements that give $i$ when added to themselves. Finally, if $n$ is odd, then there is always a unique element that gives $i$ when added to itself, and the remaining elements partition into $\frac{n-1}{2}$ pairs, after which the same analysis can be applied. ∎

Lemma 4.3.

Let $S$ be a subset of $\mathbb{Z}_{n}$ of size $k$ chosen uniformly at random, and let $i$ be any nonzero element of $\mathbb{Z}_{n}$ of order $n/d$ . Then

\mathbb{P}[i\notin S-S]\ =\ \frac{1}{{\binom{n}{k}}}\sum_{\begin{subarray}{c}(k_{1},\ldots,k_{d})\in\mathbb{Z}_{\geq 0}^{d}\\ k_{1}+\cdots+k_{d}=k\end{subarray}}\prod_{t=1}^{d}g\left(\frac{n}{d},k_{t}\right),

(48)

where

g\left(\frac{n}{d},k_{t}\right)\ =\ \begin{cases}1,\ k_{t}=0,\\ \frac{\frac{n}{d}{\binom{\frac{n}{d}-1-k_{t}}{k_{t}-1}}}{k_{t}},\ k_{t}>0.\end{cases}

(49)

Remark 4.

When $n$ is prime, we must have $d=1$ always and Equation (48) simplifies significantly to

\mathbb{P}[i\notin S-S]\ =\ \frac{g(n,k)}{{\binom{n}{k}}}\ =\ \frac{1}{{\binom{n}{k}}}\cdot\begin{cases}1,\ k=0,\\ \frac{n{\binom{n-1-k}{k-1}}}{k},\ k>0.\end{cases}

(50)

This is the reason why we restrict to such $n$ in Theorem 1.9.

Proof.

Partition $\mathbb{Z}_{n}$ into the $d$ additive cosets of the subgroup $i\mathbb{Z}_{n}$ . Each of these cosets has size $n/d$ and has elements that can be cyclically ordered such that the difference between any element and its predecessor is $i$ . Choosing a set $S$ of size $k$ such that $i\notin S-S$ is then equivalent to partitioning $k$ into $k_{1}+\cdots+k_{d}$ with each $k_{t}\geq 0$ , and choosing $k_{t}$ non-adjacent elements from the $t^{\text{th}}$ coset to include in $S$ . This is precisely what is counted by Equation (48) if $g(n/d,k_{t})$ is equal to the number of ways to choose $k_{t}$ non-adjacent elements from a cyclically ordered set of size $n/d$ .

Indeed, this is clear for $k_{t}=0$ , so assume $k_{t}>0$ . There are $n/d$ choices for the first element to be included, and each selection of $k_{t}$ elements may be made in $k_{t}$ different ways by designating different elements to be this first choice. Once the first element has been selected, the number of ways to choose the remaining elements is equal to the number of ways to choose $k_{t}-1$ non-adjacent elements from a linearly ordered set of size $n/d-1$ without choosing the extremal elements. This is the classic stars and bars problem, which yields precisely the binomial coefficient in the definition of $g(n/d,k_{t})$ . ∎

Lemma 4.4.

Let $S_{1}$ and $S_{2}$ be subsets of $\mathbb{Z}_{m}$ of sizes $m-k$ and $k$ , respectively, chosen uniformly at random, and let $i$ be any element of $\mathbb{Z}_{n}$ . Then

\mathbb{P}[i\notin S_{1}+S_{2}]\ =\ \frac{{\binom{n-k}{m-k}}}{{\binom{n}{m-k}}}.

(51)

Proof.

For any fixed choice of $S_{2}$ , we have $\binom{n}{m-k}$ total choices for $S_{1}$ . For each of these $S_{1}$ , we have $i\notin S_{1}+S_{2}$ if and only if $i-j\notin S_{1}$ for all $j\in S_{2}$ . Because $|S_{2}|=k$ , this leaves $\binom{n-k}{m-k}$ choices for $S_{1}$ such that $i\notin S_{1}+S_{2}$ . ∎

We now have all the necessary parts to prove Theorem 1.9. Simple combinatorics yields

\mathbb{P}[\text{$A$ has $k$ flips}]\ =\ \frac{{\binom{n}{k}}{\binom{n}{m-k}}}{{\binom{2n}{m}}}.

(52)

Combining this with Lemmas 4.1, 4.3, and 4.4 yields the result after some simplification. See Appendix C for complete details.

By plotting $\mathbb{E}[|A-A|]$ against $m$ for certain large values of $n$ , we can see evidence for our intuition from the beginning of this section that $\mathbb{E}[|A-A|]$ quickly becomes close to $2n$ for $m$ on the order of magnitude of $\sqrt{n}$ (Figure 1). Numerical evidence from many large primes $n$ suggests that the value of $m$ such that $\mathbb{E}[|A-A|]=n$ is about $1.3875\sqrt{n}$ .²²2Some code written for this project can be found at https://github.com/ZeusDM/MSTD-experiments.

Refer to caption — Figure 1. A plot of $\mathbb{E}[|A-A|]$ versus $m$ for $A\in\mathcal{S}_{2n,m}$ with $n=10007$ . Note that $\mathbb{E}[|A-A|]$ rapidly approaches $2n=20014$ as $m$ approaches a small multiple of $\sqrt{n}\approx 100$ .

5. Future Work

An immediate future direction of research is to extend the bounds on $m$ to show Conjecture 1.7 for all $m$ and $n$ . However, it is unlikely that the methods we have used in this paper will be useful for larger values of $m$ . This is because our approach showed that for values of $m$ that we considered, the majority of subsets of the generalized dihedral group with size $m$ were MSTD. But this is simply not the case for larger $m$ : numerical evidence shows that for any $m\gg\sqrt{n}$ , the vast majority of the sets are balanced. A new approach is required to show that out of the sets that are not balanced, more are MSTD.

To this end, it would also be productive to more carefully analyze how many elements of the group $D$ are not in $A+A$ (or not in $A-A$ ) for $|A|$ closer to $n$ . This follows the approach of [HM09] for the “slow decay” case. Such an analysis could be used to find explicit formulas for the expectation values of $|A+A|$ and $|A-A|$ . In this paper we found a formula for $\mathbb{E}[|A-A|]$ , but only when $n$ is prime. Further, to use these results, we would also need to bound the variance of $|A+A|$ and $|A-A|$ . Depending on the results, this could be enough to prove Conjecture 1.6 if we can deduce an upper bound $M_{\ell}$ on the number of MDTS sets and a lower bound $M_{s}$ on the number of MSTD sets such that $M_{\ell}<M_{s}$ .

Yet another possible approach to prove Conjecture 1.6 is to construct an injective map from MDTS sets to MSTD sets in the group. Such an approach has proven to be difficult, but has the potential advantage of working for both large and small values of $m$ .

Appendix A Proof of Lemmas 2.1 and 2.2

See 2.1

Proof.

There are only 3 possible cases to consider for $A$ .

•

$A$ contains two flip elements: in this case, adding and subtracting flips is an identical operation as each flip element has order 2. Thus, $A$ will necessarily be sum-difference balanced.
•

$A$ contains one flip and one rotation element: let $r^{i},r^{j}s\in A$ . Here, $A+A=\{1,r^{2i},r^{i+j}s,r^{j-i}s\}$ . However, $A-A=\{1,r^{i+j}s\}$ . When $r^{2i}\neq e$ or $r^{j-i}s\neq r^{i+j}s$ , $A$ will be MSTD. In the case where both of these are actually equalities ( $A+A=\{e,r^{i+j}s\}$ ), $A$ will simply be balanced.
•

$A$ contains two rotation elements: let $r^{a},r^{b}\in A$ . Here, $A+A=\{r^{2i},r^{i+j},r^{2j}\}$ . However, $A-A=\{1,r^{i-j},r^{j-i}\}$ . Note that $i\neq j$ . Both the sum set and the difference set have 3 elements except in one special case. Suppose that $r^{2i}=r^{2j}$ . Then, we have that $r^{i}r^{-j}=r^{j}r^{-i}\implies r^{i-j}=r^{j-i}$ . Thus, when $A$ contains two rotation elements only, $A$ is always balanced.

Therefore, $A$ has strictly more MSTD sets in $\mathcal{S}_{2n,2}$ than MDTS sets. ∎

See 2.2

Proof.

There are 4 possible cases to consider for $A$ .

•

$A$ contains three flip elements: just as in the $m=2$ case, since addition and subtraction of two flips are equivalent, $A$ is balanced.

•

$A$ contains two flip elements and one rotation element: let $A=\{r^{i},r^{j}s,r^{k}s\}$ for $j\neq k$ . Then the sumset is

A+A=\{r^{2i},r^{i+j}s,r^{i+k}s,r^{j-k},r^{k-j},1,r^{j-i}s,r^{k-i}s\},

(53)

while the difference set is

A-A=\{r^{i+j}s,r^{i+k}s,r^{j-k},r^{k-j},1\},

(54)

so $A-A\subseteq A+A$ . Moreover, $A$ is MSTD precisely when at least one of $r^{j-i}s$ and $r^{k-i}s$ is distinct from each of the flips in $A+A$ that lie in $A-A$ . We enumerate all the ways that this could fail to occur:

We have $r^{i+j}s=r^{j-i}s$ and $r^{i+k}s=r^{k-i}s$ if and only if $i=-i\pmod{n}$ . This happens if and only if $i=0$ or if $n$ is even and $i=n/2$ , and in each of these cases we have $\binom{n}{2}=\frac{1}{2}(n^{2}-n)$ choices for $j$ and $k$ . On the other hand, we have $r^{i+j}s=r^{k-i}s$ and $r^{i+k}s=r^{j-i}s$ if and only if $2i=k-j=j-k\pmod{n}$ , which implies that $2j=2k\pmod{n}$ . We also require $2i=1\pmod{n}$ ; because $j\neq k$ , these equations can occur if and only if $n$ is divisible by $4$ , $i=n/4$ or $3n/4$ , and $k-j=n/2\pmod{n}$ . We therefore have $2$ choices for $i$ , and $\frac{1}{2}n$ choices for $j$ and one corresponding choice of $k$ for each $i$ . Finally, note that we cannot have $r^{j-i}s=r^{k-i}s$ because $k\neq i$ , so this completes the enumeration of all such sets $A$ that fail to be MSTD.

There are $n\binom{n}{2}=\frac{1}{2}(n^{3}-n^{2})$ total sets $A$ containing exactly $2$ flips. Subtracting the balanced sets that we just enumerated, we obtain the following numbers of MSTD sets:

$\displaystyle\frac{1}{2}(n^{3}-n^{2})\$	if $n=1\pmod{2}$ ,
$\displaystyle\frac{1}{2}(n^{3}-n^{2})-n(n-1)\$	if $n=2\pmod{4}$ ,
$\displaystyle\frac{1}{2}(n^{3}-n^{2})-n(n-1)-n\$	if $n=0\pmod{4}$ .	(55)

•

$A$ contains one flip element and two rotation element: let $A=\{r^{i},r^{j},r^{k}s\}$ for $i\neq j$ . Then the sumset is

$A+A=\{r^{i+j},r^{2i},r^{2j},r^{i+k}s,r^{j+k}s,r^{k-i}s,r^{k-j}s,1\},$ (56)

while the difference set is

$A-A=\{r^{i-j},r^{j-i},r^{i+k}s,r^{j+k}s,1\},$ (57)

We show that $A$ cannot be MDTS by separately comparing the flips and the rotations in $A+A$ and $A-A$ . The flips in $A-A$ are contained in $A+A$ , so this comparison is trivial. The rotations in $A-A$ are $r^{i-j},r^{j-i},$ and $1$ ; we will show that $A+A$ has at least as many rotations. Consider the rotations in $A+A$ : note that $r^{i+j}$ is different from $r^{2i}$ and from $r^{2j}$ since $i\neq j$ , so there are at least $2$ distinct rotations in $A+A$ . If $r^{2i}\neq r^{2j}$ , then in fact $A+A$ has at least three distinct rotations, and we have $|A+A|\geq|A-A|$ . On the other hand, if $r^{2i}=r^{2j}$ , then $r^{i-j}=r^{j-i}$ , so there are only at most $2$ distinct rotations in $A-A$ , so again $|A+A|\geq|A-A|$ .
•

$A$ contains three rotation elements: for this case, there are $\binom{n}{3}$ such subsets. We compare directly to Equation (• ‣ A), the case that yields the least possible number of MSTD sets in $\mathcal{S}_{2n,3}$ . Here we have

$\frac{1}{2}(n^{3}-n^{2})-n(n-1)-n>\frac{1}{6}n(n-1)(n-2)$ (58)

or equivalently,

$n^{3}-3n^{2}-n>0,$ (59)

which is true if $n\geq 4$ . When $n=3$ , since $n=1\mod 2$ , we compare $\binom{n}{3}$ to $\frac{1}{2}(n^{3}-n^{2})$ instead, and the result is verified once again.

Therefore, even if all of the subsets with three rotation elements are MDTS, which is never true to begin with, we still have strictly more MSTD subsets of size 3 than MDTS subsets of size 3. ∎

Appendix B Proof of Lemma 3.9

For any given $(x,y,F)\in\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ , we calculate the number of pairs $a,b\in\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ where $a-b=(x,y,F)$ . Similarly, we also calculate the number of pairs $a,b\in\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ where $a+b=(x,y,F)$ . Note that these can be counted through counting the number of pairs in $\mathbb{Z}_{n}^{2}$ which results in $(x,y)$ as follows:

Case 1, $F=0$

•
Case 1.1, $a-b=(x,y,0)$
- –
  
  Case 1.1.1 $a$ and $b$ are not flips.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}-b_{1}\equiv x\mod n$ and $a_{2}-b_{2}\equiv y\mod n$
- –
  
  Case 1.1.2 $a$ and $b$ are both flips.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}-b_{1}\equiv x\mod n$ and $a_{2}-b_{2}\equiv y\mod n$
•
Case 1.2, $a+b=(x,y,0)$
- –
  
  Case 1.2.1 $a$ and $b$ are not flips.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}+b_{1}\equiv x\mod n$ and $a_{2}+b_{2}\equiv y\mod n$
- –
  
  Case 1.2.2 $a$ and $b$ are both flips.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}-b_{1}\equiv x\mod n$ and $a_{2}-b_{2}\equiv y\mod n$

Case 2, $F=1$

•
Case 2.1 $a-b=(x,y,1)$
- –
  
  Case 2.1.1 $a$ is a flip.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}+b_{1}\equiv x\mod n$ and $a_{2}+b_{2}\equiv y\mod n$
- –
  
  Case 2.1.2 $b$ is a flip.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}+b_{1}\equiv x\mod n$ and $a_{2}+b_{2}\equiv y\mod n$
•
Case 2.2 $a+b=(x,y,1)$
- –
  
  Case 2.2.1 $a$ is a flip.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}-b_{1}\equiv x\mod n$ and $a_{2}-b_{2}\equiv y\mod n$
- –
  
  Case 2.2.2 $b$ is a flip.
  
  We get that this is equivalent to counting pairs $(a_{1},a_{2}),(b_{1},b_{2})\in\mathbb{Z}_{n}^{2}$ where $a_{1}+b_{1}\equiv x\mod n$ and $a_{2}+b_{2}\equiv y\mod n$

We get a similar result with $a,b\in\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n^{2}}$ where $a-b=(xn+y,F)$ or $a+b=(xn+y,F)$ as follows:

Case 1, $F=0$

•
Case 1.1, $a-b=(xn+y,0)$
- –
  
  Case 1.1.1 $a$ and $b$ are not flips.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})-(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
- –
  
  Case 1.1.2 $a$ and $b$ are both flips.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})-(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
•
Case 1.2, $a+b=(xn+y,0)$
- –
  
  Case 1.2.1 $a$ and $b$ are not flips.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})+(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
- –
  
  Case 1.2.2 $a$ and $b$ are both flips.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})-(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$

Case 2, $F=1$

•
Case 2.1 $a-b=(xn+y,1)$
- –
  
  Case 2.1.1 $a$ is a flip.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})+(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
- –
  
  Case 2.1.2 $b$ is a flip.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})+(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
•
Case 2.2 $a+b=(xn+y,1)$
- –
  
  Case 2.2.1 $a$ is a flip.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})-(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$
- –
  
  Case 2.2.2 $b$ is a flip.
  
  We get that this is equivalent to counting pairs $a_{1}n+a_{2},b_{1}n+b_{2}\in\mathbb{Z}_{n^{2}}$ where $(a_{1}n+a_{2})+(b_{1}n+b_{2})\equiv xn+y\mod n^{2}$

Note that the signs are the same between corresponding cases in $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n^{2}}$ and $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ . And thus there are only four distinct cases we need to count, and it suffices to compare the number of 4-tuple collisions over all $(x,y)\in\mathbb{Z}_{n}^{2}$ with the number of 4-tuple collisions over all $xn+y\in\mathbb{Z}_{n^{2}}$

We will prove the main result by counting the collisions of each of the 3 types. When $n$ is odd, it is simple to see from Lemma 3.10, Lemma 3.11, and Lemma 3.12 that the number of collisions for each of the $(x,y)$ in $\mathbb{Z}_{n}^{2}$ is the same for the corresponding $xn+y\in\mathbb{Z}_{n^{2}}$ and thus there are an equal total number of collisions. We now assume that $n$ is even.

•

$a+b=c+d$
Suppose $y$ is odd, from Lemma 3.11 we have $(\frac{n^{2}}{2})^{2}$ pairs for each $(x,y)$ for a total of $\frac{n^{5}}{4}$ collisions. From Lemma 3.12, we also get $(\frac{n^{2}}{2})^{2}$ pairs for each $xn+y$ for an equal total $\frac{n^{5}}{4}$ collisions.

If $y$ is even, we have a half of $(x,y)$ yielding $(\frac{n^{2}}{2})^{2}$ pairs and another half yielding $(\frac{n^{2}+4}{2})^{2}$ pairs for a total of $\frac{n}{2}*\frac{n^{4}+n^{4}+8n^{2}+16}{4}=\frac{n^{5}+4n^{3}+8n}{4}$ . Meanwhile in $\mathbb{Z}_{n^{2}}$ we have each $xn+y$ yielding $(\frac{n^{2}+2}{2})^{2}$ for an equal total of $\frac{n^{5}+4n^{3}+4n}{4}$ . Thus, there are strictly more collisions in $\mathbb{Z}_{n}^{2}$ in this case.
•

$a+b=c-d$
Suppose $y$ is odd, we have $\frac{n^{2}}{2}$ choices for $a+b$ and $n^{2}$ choices for $c-d$ for a total of $\frac{n^{5}}{2}$ . We also get the same results for $\mathbb{Z}_{n^{2}}$ .

If $y$ is even, we have half of $(x,y)$ yielding $\frac{n^{2}}{2}*n^{2}$ pairs and another half yielding $\frac{n^{2}+4}{2}*n^{2}$ pairs for a total of $\frac{n}{2}*\frac{n^{4}+n^{4}+4n^{2}}{2}=\frac{n^{5}+2n^{3}}{2}$ . For $\mathbb{Z}_{n^{2}}$ , we have each $(x,y)$ yielding $\frac{n^{2}+2}{2}*n^{2}$ collisions for a total of $\frac{n^{5}+2n^{3}}{2}$
•

$a-b=c-d$
Both groups have the same number of collisions for each $(x,y)$ and the corresponding $xn+y$ at $n^{4}$ . Thus, the total number of collisions in this case is $n^{5}$ .

Thus, if we look over all the cases where $n$ is even, we found that either there are strictly more collisions in $\mathbb{Z}_{n}^{2}$ or there are equal amount of collisions. Thus, we can conclude that there are strictly more collisions in $\mathbb{Z}_{2}\ltimes\mathbb{Z}_{n}^{2}$ when $n$ is even.

Appendix C Proof of Theorem 1.9

We have

$\displaystyle\mathbb{E}[\|A-A\|]\$	$\displaystyle=\ \|D_{2n}\|-\sum_{g\in D_{2n}}\mathbb{P}[g\notin A-A]$
	$\displaystyle=\ 2n-\mathbb{P}[1\notin A-A]-\left[\sum_{i=1}^{n-1}\mathbb{P}[r^{i}\notin A-A]\right]-\left[\sum_{i=0}^{n-1}\mathbb{P}[r^{i}s\notin A-A]\right]$
	$\displaystyle=\ 2n-0-\left[\sum_{i=1}^{n-1}\sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S-S\|\text{$S\subseteq\mathbb{Z}_{n}$, $\|S\|=m-k$}]\\ \cdot\ \mathbb{P}[i\notin S-S\|\text{$S\subseteq\mathbb{Z}_{n}$, $\|S\|=k$}]\end{matrix}\right]$
	$\displaystyle\quad\quad\quad\quad\ \ -\left[\sum_{i=0}^{n-1}\sum_{k=0}^{m}\ \begin{matrix}[l]\ \ \mathbb{P}[\text{$A$ has $k$ flips}]\\ \cdot\ \mathbb{P}[i\notin S_{1}+S_{2}\|\text{$S_{1},S_{2}\subseteq\mathbb{Z}_{n}$, $\|S_{1}\|=m-k$, $\|S_{2}\|=k$}]\end{matrix}\right]$
	$\displaystyle=\ 2n-\left[\sum_{i=1}^{n-1}\sum_{k=0}^{m}\frac{{\binom{n}{k}}{\binom{n}{m-k}}}{{\binom{2n}{m}}}\cdot\frac{g(n,k)}{{\binom{n}{k}}}\cdot\frac{g(n,m-k)}{{\binom{n}{m-k}}}\right]$
	$\displaystyle\quad\quad\ \ \ -\left[\sum_{i=0}^{n-1}\sum_{k=0}^{m}\frac{{}{\binom{n}{m-k}}}{{\binom{2n}{m}}}\cdot\frac{{\binom{n-k}{m-k}}}{{\binom{n}{m-k}}}\right]$
	$\displaystyle=\ 2n-\left[(n-1)\sum_{k=0}^{m}\frac{g(n,k)\cdot g(n,m-k)}{{\binom{2n}{m}}}\right]-\left[n\sum_{k=0}^{m}\frac{{\binom{n}{k}}{\binom{n-k}{m-k}}}{{\binom{2n}{m}}}\right]$
	$\displaystyle=\ 2n-\left[2(n-1)\frac{g(n,0)\cdot g(n,m)}{{\binom{2n}{m}}}+(n-1)\sum_{k=1}^{m-1}\frac{\frac{n{\binom{n-1-k}{k-1}}}{k}\cdot\frac{n{\binom{n-1-m+k}{m-k-1}}}{m-k}}{{\binom{2n}{m}}}\right]$
	$\displaystyle\quad\quad\ \ \ -\left[\frac{n}{{\binom{2n}{m}}}\sum_{k=0}^{m}{\binom{n}{k}}{\binom{n-k}{m-k}}\right]$
	$\displaystyle=\ 2n-\frac{nm2^{m}{\binom{n}{m}}+2n(n-1){\binom{n-m-1}{m-1}}}{m{\binom{2n}{m}}}-\frac{n^{2}(n-1)}{{\binom{2n}{m}}}\sum_{k=1}^{m-1}\frac{{\binom{n+k-m-1}{m-k-1}}{\binom{n-k-1}{k-1}}}{k(m-k)},$	(60)

by the combinatorial identity

\sum_{k=0}^{m}{\binom{n}{k}}{\binom{n-k}{m-k}}\ =\ 2^{m}{\binom{n}{m}}.

(61)

References

[DKMMWW15] Thao Do et al. “Sets characterized by missing sums and differences in dilating polytopes” In J. Number Theory 157, 2015, pp. 123–153 DOI: 10.1016/j.jnt.2015.04.027
[Heg07] Peter V. Hegarty “Some explicit constructions of sets with more sums than differences” In Acta Arith. 130.1 Instytut Matematyczny Polskiej Akademii Nauk, 2007, pp. 61–77 DOI: 10.4064/aa130-1-4
[HKLLMT20] John Haviland et al. “More Sums Than Differences Sets in Finite Non-Abelian Groups”, Presentation slides, Young Mathematicians Conference, The Ohio State University, 2020 URL: http://www-personal.umich.edu/~havijw/resources/talks/mstd_finite_nonabelian.pdf
[HM09] Peter V. Hegarty and Steven J. Miller “When almost all sets are difference dominated” In Random Structures Algorithms 35.1 Wiley Online Library, 2009, pp. 118–136 DOI: 10.1002/rsa.20268
[HM13] Virginia Hogan and Steven J. Miller “When Generalized Sumsets are Difference Dominated”, 2013 arXiv:1301.5703 [math.NT]
[MO07] Greg Martin and Kevin O’Bryant “Many sets have more sums than differences” In Additive combinatorics 43, CRM Proc. Lecture Notes Amer. Math. Soc., Providence, RI, 2007, pp. 287–305 DOI: 10.1090/crmp/043/16
[MOS10] Steven J. Miller, Brooke Orosz and Daniel Scheinerman “Explicit constructions of infinite families of MSTD sets” In J. Number Theory 130.5, 2010, pp. 1221–1233 DOI: 10.1016/j.jnt.2009.09.003
[MV14] Steven J. Miller and Kevin Vissuet “Most subsets are balanced in finite groups” In Combinatorial and additive number theory 101, Springer Proc. Math. Stat. Springer, 2014, pp. 147–157 DOI: 10.1007/978-1-4939-1601-6“˙11
[Nat07] Melvyn B. Nathanson “Problems in additive number theory. I” In Additive combinatorics 43, CRM Proc. Lecture Notes Amer. Math. Soc., Providence, RI, 2007, pp. 263–270 DOI: 10.1090/crmp/043/13
[Nat07a] Melvyn B. Nathanson “Sets with more sums than differences” In Integers 7, 2007, pp. A5\bibrangessep24 arXiv: https://www.emis.de/journals/INTEGERS/papers/h5/h5.pdf
[Zha10] Yufei Zhao “Constructing MSTD sets using bidirectional ballot sequences” In J. Number Theory 130.5, 2010, pp. 1212–1220 DOI: 10.1016/j.jnt.2009.11.005
[Zha10a] Yufei Zhao “Counting MSTD sets in finite abelian groups” In J. Number Theory 130.10 Elsevier, 2010, pp. 2308–2322 DOI: 10.1016/j.jnt.2010.06.001
[Zha11] Yufei Zhao “Sets characterized by missing sums and differences” In J. Number Theory 131.11, 2011, pp. 2107–2134 DOI: 10.1016/j.jnt.2011.05.003