scaletikzpicturetowidth[1]\BODY University of Warsaw, Poland and Samsung R&D Polandjrad@mimuw.edu.plhttps://orcid.org/0000-0002-0067-6401 University of Warsaw, Polandrytter@mimuw.edu.plhttps://orcid.org/0000-0002-9162-6724 University of Warsaw, Polandjks@mimuw.edu.pl0000-0003-2207-0053 University of Warsaw, Polandwalen@mimuw.edu.plhttps://orcid.org/0000-0002-7369-3309 University of Warsaw, Polandw.zuba@mimuw.edu.plhttps://orcid.org/0000-0002-1988-3507
Acknowledgements.
The authors warmly thank Paweł Gawrychowski and Tomasz Kociumaka for helpful discussions. \CopyrightJakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba {CCSXML} <ccs2012> <concept> <concept_id>10003752.10003809.10010031.10010032</concept_id> <concept_desc>Theory of computation Pattern matching</concept_desc> <concept_significance>500</concept_significance> </concept> </ccs2012> \ccsdesc[500]Theory of computation Pattern matching \hideLIPIcsHardness of Detecting Abelian and Additive Square Factors in Strings
Abstract
We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assuming the 3SUM conjecture) of several problems related to finding Abelian square and additive square factors in a string. In particular, we conclude conditional optimality of the state-of-the-art algorithms for finding such factors.
Overall, we show 3SUM-hardness of (a) detecting an Abelian square factor of an odd half-length, (b) computing centers of all Abelian square factors, (c) detecting an additive square factor in a length- string of integers of magnitude , and (d) a problem of computing a double 3-term arithmetic progression (i.e., finding indices such that ) in a sequence of integers of magnitude .
Problem (d) is essentially a convolution version of the AVERAGE problem that was proposed in a manuscript of Erickson. We obtain a conditional lower bound for it with the aid of techniques recently developed by Dudek et al. [STOC 2020]. Problem (d) immediately reduces to problem (c) and is a step in reductions to problems (a) and (b). In conditional lower bounds for problems (a) and (b) we apply an encoding of Amir et al. [ICALP 2014] and extend it using several string gadgets that include arbitrarily long Abelian-square-free strings.
Our reductions also imply conditional lower bounds for detecting Abelian squares in strings over a constant-sized alphabet. We also show a subquadratic upper bound in this case, applying a result of Chan and Lewenstein [STOC 2015].
keywords:
Abelian square, additive square, 3SUM problem1 Introduction
Abelian squares.
An Abelian square, Ab-square in short (also known as a jumbled square), is a string of the form , where is a permutation of ; we say that and are Ab-equivalent. We are interested in factors (i.e., substrings composed of consecutive letters) of a given text string being Ab-squares.
Example 1.1.
The string
has exactly two Ab-square factors of length 12, shown above (but it has also Ab-squares of other lengths, e.g. 5665, 11, 1111, 011110).
Ab-squares were first studied by Erdős [16], who posed a question on the smallest alphabet size for which there exists an infinite Ab-square-free string, i.e., an infinite string without Ab-square factors. The first example of such a string over a finite alphabet was given by Evdokimov [18]. Later the alphabet size was improved to five by Pleasants [34] and finally an optimal example over a four-letter alphabet was shown by Keränen [27]. Further results on combinatorics of Ab-square-free strings and several examples of their applications in group theory, algorithmic music and cryptography can be found in [26] and references therein. Avoidability of long Ab-squares was also considered [36].
Strings containing Ab-squares were also studied. Motivated by another problem posed by Erdős [16], Entringer et al. [15] showed that every infinite binary string has arbitrarily long Ab-square factors. Fici et al. [19] considered infinite strings containing many distinct Ab-squares. A string of length may contain Ab-square factors that are distinct as strings, but contains only Ab-squares which are pairwise Abelian nonequivalent (correspond to different Parikh vectors), see [28]. It is also conjectured that a binary string of length must have at least distinct [20] and nonequivalent [21] Ab-square factors. For more conjectures related to combinatorics of Ab-square factors of strings ad circular strings, see [39].
Several algorithms computing Ab-square factors of a string are known. All Ab-squares in a string of length can be computed in time [13]. For a string over a constant-sized alphabet, all Ab-square factors of a string can be computed in time and the longest Ab-square can be computed in time [29, 30]. Moreover, for a string of length that is given by its run-length encoding consisting of runs, the longest Ab-square that ends at each position can be computed in time [2] or in time [40]; both approaches require time in the worst case.
In [37] a different problem of enumerating strings being Ab-squares was considered.
Additive squares.
An additive square is an even-length string over an integer alphabet such that the sums of characters of the halves of this string are the same.
Example 1.2.
The following string has exactly 4 additive squares of length 10, as shown.
All of them except for the rightmost one are also Ab-squares. This string does not contain any longer additive square. Altogether this string has 8 additive square factors.
An Ab-square (over an integer alphabet) is an additive square, but not necessarily the other way around. Combinatorially, problems related to additive squares are hard, in particular avoiding additive squares seems more difficult than avoiding Ab-squares. There are infinitely many strings over avoiding Ab-squares, but there are only finitely many strings over the same alphabet avoiding additive squares; see [22].
In fact it is unknown if there are infinitely many strings over any finite integer alphabet avoiding additive squares [7, 25, 33]. For additive cubes the property was proved in [9] (see also [32]) however.
Nowadays, combinatorial study of Ab-square and additive square factors often involves computer experiments; see e.g. [9, 19, 36]. In addition to other applications, efficient algorithms detecting such types of squares could provide a significant aid in this research. In case of classic square factors (i.e., factors of the form ), a linear-time algorithm for computing them is known for a string over a constant [24] and over an integer alphabet [4, 12]. We show that, unfortunately, in many cases the existence of near-linear-time algorithms for detecting Ab-square and additive square factors is unlikely, based on conjectured hardness of the problem.
problem.
The problem asks if there are distinct elements such that for a given set of integers ; see [35]. It is a general belief that the following conjecture is true for the word-RAM model.
conjecture:
There is no time algorithm for the problem, for any constant .
A problem with input of size is called -hard if an -time solution to the problem implies an -time solution for , for some constants .
Our results.
-
•
We show that the problems of computing all centers of Ab-square factors and detecting an odd half-length Ab-square factor, called an odd Ab-square (consequently also computing all lengths of Ab-square factors), for a length- string over an alphabet of size , cannot be solved in time, for constant , unless the 3SUM conjecture fails. Weaker conditional lower bounds are also stated in the case of a constant-sized alphabet.
-
•
For constant-sized alphabets, we show strongly sub-quadratic algorithms for these problems based on an involved result of [11] related to jumbled indexing.
-
•
En route we prove that detection of a double 3-term arithmetic progression (see [8]) and additive squares in a length- sequence of integers of magnitude is -hard.
We obtain deterministic conditional lower bounds from a convolution version of that is well-known to be -hard.
Related work.
In the jumbled indexing problem, we are given a text and are to answer queries for a pattern specified by a Parikh vector which gives, for each letter of the alphabet, the number of occurrences of this letter in the pattern. For each query, we are to check if there is a factor of the text that is Ab-equivalent to the pattern (existence query) or report all such factors (reporting query). Chan and Lewenstein [11] showed a data structure that can be constructed in truly subquadratic expected time and answers existence queries in truly sublinear time for a constant-sized alphabet (deterministic constructions for very small alphabets were also shown). Amir et al. [3] showed under a 3SUM-hardness assumption that jumbled indexing with existence queries requires preprocessing time or queries for any for an alphabet of size . They also provided particular constants for an alphabet of a constant size such that, under a stronger 3SUM-hardness assumption, jumbled indexing requires preprocessing time or queries. We use the techniques from both results in our algorithm and conditional lower bound for Ab-squares, respectively. The lower bound of Amir et al. was later improved and extended to both existence and reporting variants and any constant by Goldstein et al. [23, Section 7] with the aid of randomization. Moreover, recently an unconditional lower bound for the reporting variant was given in [1].
Our techniques.
A subsequence of three distinct positions is a 3-term double arithmetic progression (3dap in short) if it is an arithmetic progression and the elements on these positions also form an arithmetic progression. The problem of finding a 3dap in a sequence is denoted by . It is an odd 3dap if the first and the third positions are odd and the middle position is even. The corresponding problem is denoted by . First we reduce the convolution problem 3SUM (known to be 3SUM-hard) to the problem via as an intermediate problem. This uses a divide-and-conquer approach and a partition of sets into sets avoiding bad arithmetic progression of length 3.
The problem reduces in a simple way to detection of an additive square, showing that the latter problem is 3SUM hard.
Next, the problem is encoded as a string. We follow the high-level idea from Amir et al. Instead of checking equality of numbers, we can check equality of their remainders modulo sufficiently many prime numbers. Then, each prime number corresponds to a distinct characters. If the numbers are then only prime numbers are needed. However, there is a certain technical complication, already present in the paper of Amir et al., which needs an introduction of additional gadgets working as equalizers. The details, compared with construction of Amir at al., are different, mostly because in the end we want to ask about detection, not indexing.
Then we consider the problem of computing all centers of Ab-squares, this requires new gadgets. We show that computing all centers of Ab-squares is 3SUM-hard, as well as detection of any Ab-square which is well centred.
Later we extend this to detection of any odd Ab-square. We use a construction of a string over the alphabet of size 4 with no Ab-square. The input string is “shuffled” with such a string, with some separators added. This forces odd Ab-squares to be well centered, in this way we reduce the previously considered problem of detection of any well-centred Ab-square to the detection of any odd Ab-square. Ultimately, this shows that the latter problem is 3SUM-hard.
2 From to finding double 3-term arithmetic progressions
For integers , by we denote the set . We use the following convolution variant of the problem that is -hard; see [10, 31, 35] for both randomized and deterministic reductions. As already noted in [3], the range of elements can be made using a randomized hashing reduction from [5, 35].
Input: A sequence Output: Yes if there are such that ; no otherwise.
Let us denote and define the condition:
We omit the subscript if it is clear from the context. The last part of the condition is equivalent to .
Our first goal is to reduce the problem to the following one with .
Double 3-Term Arithmetic Progression, ) Input: , each of is in . Output: .
In Section 2.1 we obtain a reduction of to an intermediate version of with additional constraints on , and in Section 2.2 we show how these constraints can be avoided.
2.1 From to
Let us fix an integer sequence . For an arithmetic progression (arithmetic sequence) , where , i.e. , we define the following extended functions.
Note that it can happen that . For a fixed the input size is .
Lemma 2.1.
An instance of can be reduced to an alternative of instances of of total size in time.
Proof 2.2.
If , by and we denote the subsequences and , respectively. We proceed recursively as shown in the following function , with the first call to .
Correctness. Let and assume there are two indices such that . If is odd, then returns true. Otherwise both are of the same parity, so or . Consequently, the problem is split recursively into subproblems that correspond to and .
Complexity. Let us observe that one call to creates an instance of of size in time ( does not change). Let and denote the total number and size of all instances of generated by , when initially . We then have
which yields and . The reduction takes time.
We say that a 3-element arithmetic progression is a good progression if the middle element is even and two others are odd and introduce the following problem.
() Input: , each of is in . Output: and is a good progression ].
Lemma 2.3.
is reducible in time and space to , where .
Proof 2.4.
Let . Define and let be a sequence of length that is created as follows:
-
1.
put at subsequent odd positions in ;
-
2.
at each even position , put or, if , put .
-
3.
multiply elements on even positions by 2.
After the first two steps is equivalent to for odd and even ; see Figure 1. Then, after the third step, is equivalent to .
2.2 From to
Our main tool in this subsection is partitioning a set of integers into progression-free sets. A set of integers is called progression-free if it does not contain a non-constant three-element arithmetic progression. We use the following recent result that extends a classical paper of Behrend [6].
Theorem 2.5 ([14]).
Any set can be partitioned into progression-free sets in time.
Lemma 2.6.
We can construct in time a family of subsets of satisfying:
-
(a)
Each good 3-element progression is contained in some .
-
(b)
If , then all 3-element arithmetic progressions in are good.
Proof 2.7.
Let us divide the elements from into three classes:
Each element has the colour blue, red or green of its corresponding class. Each class forms an arithmetic progression.
A progression is called multi-chromatic if its elements are of three distinct colours. Let us observe that a 3-element progression is good if and only if it is multi-chromatic. Indeed, this is because if (or ), then is odd.
Now instead of good progressions we will deal with multi-chromatic progressions. We treat sets of integers as increasing sequences and for a set we denote by and the subsets and .
For example .
Our construction works as follows:
-
1.
Partition the set into classes .
-
2.
For each class partition it in time into a family of progression-free sets with the use of Theorem 2.5.
-
3.
Refine each partition , splitting each set into two sets , , so that for each set in the new refined partition we have or . Each family is still of size .
-
4.
Return .
Proof of point (a). Each multi-chromatic progression is contained in some since each element of is contained in a set from .
Proof of point (b). The proof is by contradiction. Assume that contains a progression which is not multi-chromatic. There are two cases.
-
Case 1: the progression is monochromatic, hence it appears in a single set . However every is progression-free (step 2), hence such a progression cannot appear in any ; a contradiction.
-
Case 2: the progression contains exactly two different colors. Observe that if , then (if the middle element of progression belongs to the same class as one of the other elements, then the triple is monochromatic), hence the two-coloured arithmetic progression has to consist of and .
Since both belong to or (step 3), must belong to (if , then ). Consequently, the progression cannot contain exactly two colours; a contradiction.
Our next tool is a deactivation of a set of elements which indexes are not in a given set , that is, omitting them in the computation of a solution. For the operation replaces each element on position by , where .
Lemma 2.8.
.
Proof 2.9.
The part if obvious, so it suffices to show . If at least one, but not all, of is not in , then it can be checked that cannot hold for because and differ by at least . Indeed, there are seven possible cases:
-
1.
, then
-
2.
, , then
-
3.
, works as the previous case
-
4.
, , then
-
5.
, , then
-
6.
, works as the previous case
-
7.
, , then .
Hence, apart from the first case, where none of the indices belongs to , the absolute value of difference is at least .
Otherwise, if all the positions are not in , then does not hold because
An instance is called an odd-half instance if is false for such that is even (equivalently, for such that and have the same parity). Efficient equivalence
follows now from Lemmas 2.6 and 2.8.
This produces only odd-half instances because only good progressions are left in the construction of Lemma 2.6. The instances have elements in . We can increase all the elements by so that they become non-negative. This implies:
Lemma 2.10.
An instance of can be reduced in time to odd-half instances of of total size and with elements up to .
Finally, we show that the resulting instances can be glued together to a single equivalent one.
Theorem 2.11.
An instance of can be reduced in time to an odd-half instance of of size with elements up to .
Proof 2.12.
With Lemmas 2.1, 2.3 and 2.10 we obtain a reduction from to odd-half instances of of total size . The instances have elements in . We will show that these instances can be reduced to a single odd-half instance of of size with elements in the range in time . The resulting instance will return true if and only if at least one of the input instances does.
Let be the number of the instances of , numbered through . We use Theorem 2.5 111Actually, a deterministic version of Behrend’s construction from [14] or an earlier construction of Salem and Spencer [38] would suffice here. and pick the largest constructed progression-free set , for some . By the pigeonhole principle, . We select that is large enough so that , so , and trim the set to the size . Let . For instance we multiply all its elements by and add to each element the value . Finally we concatenate all the instances.
If any of the input instances returns true, then so does the output instance, since multiplication by and addition of the same number to all elements cannot affect the outcome of a single instance. If none of the input instances returns true, then the only possibility for the output instance to return true is to contain a 3-element arithmetic progression with elements from multiple parts corresponding to the input instances. However, this is impossible since, taken modulo , the progression would form an arithmetic progression in the set .
Corollary 2.13.
The general problem is also -hard.
Remark 2.14.
Remark 2.15.
The AVERAGE problem (introduced by J. Erickson [17]) asks if there are distinct elements such that for a given set of integers. It was recently shown to be -hard [14]. The problem can be viewed as a convolution version of the AVERAGE problem222https://cs.stackexchange.com/questions/10681/is-detecting-doubly-arithmetic-progressions-3sum-hard/10725#10725. The ideas based on almost linear hashing used in the reductions from to [35, 10] can be extended with some effort to reduce AVERAGE to . We presented a different reduction that additionally directly leads to an instance of with an odd-half property, which is essential in our proof of -hardness of computing Ab-squares (see the proof of Lemma 4.2).
2.3 Hardness of detecting additive squares
If the alphabet is a set of integers, then a string is called an additive square if , where and .
Theorem 2.16.
Finding an additive square in a length- sequence composed of integers of magnitude is -hard.
Proof 2.17.
We use Theorem 2.11 to reduce to an instance of of size with elements in the requested range. returns true on an instance if and only if the sequence contains an additive square. As the reduction works in total time, the conclusion follows.
3 From arithmetics to Abelian stringology
We use capital letters to denote strings and lower case Greek letters to denote sets of integers. We assume that the positions in a string are numbered 1 through , where denotes the length of . By and we denote the th letter of and the string called a factor of . The reverse of string , i.e. the string , is denoted as . By we denote the empty string. By we denote the set of distinct letters in .
We denote Ab-equivalence of and by . For a string , by we denote the Parikh vector of . Then if and only if and .
We use an encoding of Amir et al. [3] based on the Chinese remainder theorem to connect -type problems with Abelian stringology.
Let be prime numbers. The Chinese remainder theorem states that if one knows the remainders of an integer , such that , when dividing by ’s, then one can uniquely determine . Assuming that the remainders of an integer are , we could encode as a possibly short string over an alphabet (the symbols correspond to consecutive prime numbers).
For example for primes 2,3,5 the encoding of 11 would be since its remainders modulo 2,3,5 are 1,2,1, respectively. However, we are interested in encodings of subtractions of one number from another one, and it is more complicated.
Let be an instance of and be remainders of modulo . Like Amir et al. [3], we define for and ,
We choose a sequence of distinct primes such that . In this way we encode the difference , for , by a string . An obstacle is the potentially possible inequality . For example
However a small correction is sufficient, due to the following observation. {observation} , where .
If we apply the encoding to an instance of , we obtain a lemma that is analogous to [3, Lemma 1].
Lemma 3.1.
holds for , even, iff for each , there are , such that
Let be a morphism such that for each . We treat a set as a string , where . If we interpret the vector as , then Lemma 3.1 directly implies the following fact.
Lemma 1.
Assume and is even. Then
for some disjoint subsets of .
4 Hardness of computing all centers of Ab-squares
We construct a text over the alphabet such that has a solution if and only if contains an Ab-square with one of specified centers, so-called well-placed Ab-square.
First we extend each to have the same length , to be defined later. Intuitively, it is needed to control the number of ’s in the strings from 1. We append occurrences of a letter to each . Let denote this modified string.
Lemma 1 immediately implies the following fact.
Lemma 4.1.
Assume and is even. Then
for some disjoint subsets of , where with .
The parts , in the above lemma can be treated as equalizers. Let us note that in the above lemma we can assume that .
A pair of disjoint sets that satisfies will be called a 2-partition of . For a 2-partition of , we use the string
called a -string. If , , an example of a -string is .
Let be the sequence of all pairs of -strings. Define
We have , so .
For disjoint subsets and integers , there are decompositions and , where . Let us recall the morphism such that for each . We define additionally for and set
Let us observe that indeed holds since and the length of for any -string is .
We add two new letters and define the following string (the symbols “” are not parts of the string, but only show supposed centers of Ab-squares).
(1) |
An Ab-square is called well-placed if its center is between the letters in any order. Recall that, due to Theorem 2.11, we can assume that the input to guarantees that only odd-half instances could have solutions.
Lemma 4.2.
Assume is an odd-half instance. Then has a solution if and only if contains a well-placed Ab-square.
Proof 4.3.
Let be an odd-half instance of . We show two implications.
Assume that holds for . Lemma 4.1 implies that for strings such that we have
(2) |
for some disjoint subsets of , where with . Indeed, we use the fact that and the counts of letters and on both hand sides are equal (because is odd). By Section 4, we obtain a well-placed Ab-square in (or we obtain it after exchanging all letters with ).
Assume that has a well-placed Ab-square factor with center immediately after (the case that it is immediately after is symmetric). Let us investigate what can be the position of the first letter of this Ab-square.
Recall that for each , so can be seen as composed of blocks of length . We will check which of these blocks can contain , by checking the counts of each of the letters in both halves of the Ab-square. The positions of letters in repeat with period , so it is sufficient to inspect the first 6 blocks on each side, as the remaining ones will behave periodically; see Figures 3 and 4.
By counting letters in both halves of the Ab-square, it can be readily verified that cannot be in any block or ; if in any block or , it can only be the first position of the block; it cannot be the first position in a block ; and it can be in any position in a block .
Moreover, cannot be the first position in a block , since this would imply, by Lemma 4.1, that holds for such that the block immediately follows the block and . However, in this case is even, which is impossible.
If is the first position of a block , then this implies, again by Lemma 4.1, that holds for . In this case is odd, so this is a valid solution to the corresponding instance.
We are left with the case that belongs to a block or (and in case of does not coincide with the position of the letter ). Henceforth it suffices to count letters different from in the halves. Each of the gadgets is a concatenation of Ab-equivalent strings of the form , where are composed of letters only. By counting the letters # and $ in both halves of the Ab-square, we see that can only be a position which holds the letter or #.
Theorem 4.4.
Computing all positions that are centers of Ab-square factors in a length- string over an alphabet of size is -hard.
Proof 4.5.
Due to Theorem 2.11 we can reduce in time to an odd-half instance of of size with elements in the range .
We construct the string as shown in Eq. 1 for the sequence . Then Lemma 4.2 implies that is a YES-instance if and only if has a well-placed Ab-square. The string has length . Each of the strings has length and is composed of strings of length , i.e., -images of -strings.
Hence, . We select such that and simultaneously . Then we have and the primes are of magnitude (we can choose consecutive primes computed using Eratosthenes’s sieve).
Overall and . (One can obtain any alphabet up to by appending distinct letters to .)
With the same argument for a constant-sized alphabet we obtain the following result.
Theorem 4.6.
All positions that are centers of Ab-square factors in a length- string over an alphabet of size , for a constant , cannot be computed in time, for a constant , unless the conjecture fails.
5 Computing centers of Ab-squares for constant-sized alphabets
A set of vectors in is called monotone if its elements can be ordered so that they form a monotone non-decreasing sequence on each coordinate.
Definition 5.1.
For sets and of vectors we define
and for a string we define: . Let us also denote by the length of a string corresponding to a Parikh vector .
In the algorithm we use the following fact shown in [11]. The exact complexities can be found in [11, Theorem 3.1].
Fact 2 ([11]).
Given three monotone sequences in for a constant , we can compute in expected time for a constant , or in worst case time for a constant if .
Theorem 5.2.
For a string of length over an alphabet of size , we can compute centers of all Ab-squares and centers of all odd Ab-squares in expected time or in worst case time if , for .
Proof 5.3.
We use the above algorithm. Correctness of the algorithm is straightforward; see Figure 5. If
then .
Consequently, after cancelling the same parts on both sides, , equivalently if and only if the factor corresponding to is an Ab-square centred in . The figure shows the case when is in the right half of the strings; the other case is symmetric.
In case of of odd Ab-squares let
In the algorithm the statement is executed for both , with
Other parts of the algorithm, as well as its analysis, are essentially the same.
6 Detecting odd Ab-squares
Unfortunately the string from Lemma 4.2 has many Ab-squares which are not well-placed. Our approach is to embed the (slightly) modified string into a string which is a special composition of and a combination of long quaternary Ab-square-free strings. The resulting string will fix the potential centers in specified locations. We use additional letters: and .
6.1 Fixing centers
We show first a fact useful in fixing Ab-squares in specified places (Lemma 6.5). Keränen’s construction [27] of a quaternary Ab-square-free string consists in iterating a certain morphism , such that for each of the four letters , on an initially single-letter string. This implies the following lemma.
Lemma 6.1 (Keränen [27]).
A length- quaternary Ab-square-free string can be generated in time.
Let be any Ab-square-free string of length over alphabet . Let us define
Lemma 6.2.
The string contains exactly the following Ab-squares:
-
(1)
of length divisible by ; and
-
(2)
with the center between two ’s and of all admissible even lengths other than , for an integer .
Proof 6.3.
Let be a factor of . If has length greater than or equal to , then its middle length- factor forms a classic square, and after removing it we obtain a different factor of length smaller by , which is centred exactly like . Hence, we can focus only on non-empty factors of length smaller than .
If is centred between two ’s, then after removing letters and we obtain an even palindrome (hence also an Ab-square). If is shorter than , then no letters or occur. If it is longer than , then both parts contain one letter and each. If its length is exactly , then the letters and remain unmatched, hence it is the only case where the factor is not an Ab-square.
Let us assume that is centred in a different place. If the factor does not contain any of the letters , or , then it is a factor of or its reverse, hence it cannot be an Ab-square. Otherwise, the factor needs to contain each of , twice and four times. Then fully contains a factor
Remark 6.4.
Lemma 6.2 works for any Ab-square-free string such that .
For equal-length strings we define the string
For example, .
The parity condition for half lengths of Ab-squares in the following observation justifies the usage of the additional letter in . Let be the string resulting from by removing all letters outside . {observation} Assume are equal-length strings composed of disjoint sets of letters distinct from and is an Ab-square in . Then are Ab-squares in , respectively (we say that these Ab-squares are implied by ). Moreover, are of the same parity. We say that an even-length factor of a string is centred at if it has its center between positions and in . By and we denote that divides and does not divide . For an illustration of the following lemma, see Figure 7.
Lemma 6.5.
Let , be a string of length such that its alphabet is disjoint with , , and let an integer satisfy . Then a length- factor of is an Ab-square if and only if it is centred in at , contains an Ab-square factor of length centred in at , and .
Proof 6.6.
By the disjointness of sets of letters in and , each Ab-square in has length that is divisible by 3. The following claim is then readily verified (cf. Section 6.1).
Claim 3.
For positive integer such that , a length- factor of centred at is an Ab-square if and only if the length- factors in and centred at and , respectively, are Ab-squares.
Let integer satisfy and . We show two implications.
If contains an Ab-square factor of length centred at some , then the implied Ab-square factor of has length , where , so by Lemma 6.2 it has its center between two ’s, i.e., . Hence, .
Moreover, also by Lemma 6.2. Finally, the implied Ab-square factor of indeed has length and is centred at .
Let , , and assume that contains an Ab-square factor of length centred at . We have so by Lemma 6.2 the string contains an Ab-square factor of length centred at . Finally, the unary string , certainly contains an Ab-square factor of length centred at . By the claim, contains an Ab-square of length centred at that implies the three Ab-squares.
6.2 Main result
We use the technique of fixing Ab-squares from Lemma 6.5. Moreover, we make the following minor modifications upon the construction of string in Section 4:
-
(1)
Each fragment is extended by one letter to , and
-
(2)
the letters are replaced each by two letters , respectively.
Intuitively, (1) allows to extend Ab-squares considered in the proof of Lemma 4.2 by one letter to either side, and (2) makes even which facilitates the usage of Lemma 6.5 with . It can be verified by inspecting the proof that Lemma 4.2 still holds after these two changes. We refer to all the notions from Section 4 after these modifications.
Theorem 6.7.
Checking if a length- string over an alphabet of size contains an odd Ab-square is -hard. Moreover, for a string over an alphabet of size , for a constant , the same problem cannot be solved in time, for a constant , unless the conjecture fails.
Proof 6.8.
It is enough now to show the following equivalence for , where . We assume that .
Claim 4.
An odd-half instance of is a YES-instance if and only if has an odd Ab-square factor.
Proof 6.9.
Assume that is an odd-half instance and has a solution.
By Lemma 4.2, contains a well-placed Ab-square, that is, an Ab-square centred at a position such that . (Recall that .) Moreover, in the proof of that lemma it is shown that in this case there exists a well-placed Ab-square in that satisfies the following additional requirements: (1) it starts within the gadget ; (2) it starts and ends within a block of ’s; (3) its maximal prefix and suffix consisting of letters are and , where .
Let denote the half length of this Ab-square. By (2) and (3), if is even, the Ab-square can be extended by one letter to either side (because we have extended each block ) so that becomes odd. Moreover, by (1), we have , in particular, . Then Lemma 6.5 concludes that the factor of centred at and of length such that is an Ab-square. Its half length, , is odd, as desired.
Assume that has an Ab-square factor of length such that is odd. In particular, we have , so by Lemma 6.5 the Ab-square is centred in at and contains an Ab-square factor of length centred in at . If , then and is well-placed.
Otherwise, cannot be an Ab-square due to the following fact: does not contain an Ab-square factor of length not divisible by and centred at . Indeed, similarly as in the proof of Lemma 4.2, we will show that each even-length factor centred at such contains different counts of one of the letters in both halves. The positions of letters in repeat with period , so it is sufficient to inspect the first 6 blocks on each side, as the remaining ones will behave periodically; see Figures 8 and 9.
letter | |||||||||
---|---|---|---|---|---|---|---|---|---|
distance | |||||||||
letter | |||||||||
distance |
letter | |||||||||
---|---|---|---|---|---|---|---|---|---|
distance | |||||||||
letter | |||||||||
distance |
An exhaustive verification can be performed as follows. First, in Table 1, we count the distances of letters from in both directions to the center of the factor. In Table 2 we perform a merge of these two sequences of distances assuming that .
For each distance, we write a letter that is located at this distance with a “” sign if it is in the left half and with a “” sign otherwise. Then remaining columns show the partial sum of the number of occurrences of the letter in the left and in the right half, for each .
position | letter | |||
---|---|---|---|---|
position | letter | |||
---|---|---|---|---|
Consequently, as in Lemma 4.2, the corresponding instance of is a YES-instance.
The complexities in the theorem are obtained as in Theorems 4.4 and 4.6.
7 Open problems
The most interesting questions that remain open are as follows:
-
1.
Is checking Ab-square-freeness -hard? Our reductions allowed us to show -hardness of detecting an odd Ab-square.
-
2.
Can one detect an additive square in a length- string over a constant-sized alphabet in time, for some ? We have shown -hardness of this problem for an alphabet that is polynomial in .
References
- [1] Peyman Afshani, Ingo van Duijn, Rasmus Killmann, and Jesper Sindahl Nielsen. A lower bound for jumbled indexing. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 592–606. SIAM, 2020. doi:10.1137/1.9781611975994.36.
- [2] Amihood Amir, Alberto Apostolico, Tirza Hirst, Gad M. Landau, Noa Lewenstein, and Liat Rozenberg. Algorithms for jumbled indexing, jumbled border and jumbled square on run-length encoded strings. Theoretical Computer Science, 656:146–159, 2016. doi:10.1016/j.tcs.2016.04.030.
- [3] Amihood Amir, Timothy M. Chan, Moshe Lewenstein, and Noa Lewenstein. On hardness of jumbled indexing. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, volume 8572 of Lecture Notes in Computer Science, pages 114–125. Springer, 2014. doi:10.1007/978-3-662-43948-7_10.
- [4] Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, volume 78 of LIPIcs, pages 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.CPM.2017.22.
- [5] Ilya Baran, Erik D. Demaine, and Mihai Patrascu. Subquadratic algorithms for 3SUM. Algorithmica, 50(4):584–596, 2008. doi:10.1007/s00453-007-9036-3.
- [6] Felix Adalbert Behrend. On sets of integers which contain no three terms in arithmetical progression. Proceedings of the National Academy of Sciences of the United States of America, 32(12):331–332, 1946. doi:10.1073/pnas.32.12.331.
- [7] Tom C. Brown and Allen R. Freedman. Arithmetic progressions in lacunary sets. Rocky Mountain Journal of Mathematics, 17(3):587–596, 1987.
- [8] Tom C. Brown, Veselin Jungić, and Andrew Poelstra. On double 3-term arithmetic progressions. Integers, 14:A43, 2014. URL: https://www.emis.de/journals/INTEGERS/papers/o43/o43.Abstract.html.
- [9] Julien Cassaigne, James D. Currie, Luke Schaeffer, and Jeffrey O. Shallit. Avoiding three consecutive blocks of the same size and same sum. Journal of ACM, 61(2):10:1–10:17, 2014. doi:10.1145/2590775.
- [10] Timothy M. Chan and Qizheng He. Reducing 3SUM to Convolution-3SUM. In Martin Farach-Colton and Inge Li Gørtz, editors, 3rd Symposium on Simplicity in Algorithms, SOSA@SODA 2020, Salt Lake City, UT, USA, January 6-7, 2020, pages 1–7. SIAM, 2020. doi:10.1137/1.9781611976014.1.
- [11] Timothy M. Chan and Moshe Lewenstein. Clustered integer 3SUM via additive combinatorics. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 31–40. ACM, 2015. doi:10.1145/2746539.2746568.
- [12] Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Extracting powers and periods in a word from its runs structure. Theoretical Computer Science, 521:29–41, 2014. doi:10.1016/j.tcs.2013.11.018.
- [13] Larry J. Cummings and William F. Smyth. Weak repetitions in strings. Journal of Combinatorial Mathematics and Combinatorial Computing, 24:33–48, 1997.
- [14] Bartłomiej Dudek, Paweł Gawrychowski, and Tatiana Starikovskaya. All non-trivial variants of 3-LDT are equivalent. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 974–981. ACM, 2020. doi:10.1145/3357713.3384275.
- [15] Roger C. Entringer, Douglas E. Jackson, and J.A. Schatz. On nonrepetitive sequences. Journal of Combinatorial Theory, Series A, 16(2):159–164, 1974. doi:10.1016/0097-3165(74)90041-7.
- [16] Paul Erdős. Some unsolved problems. Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei, 6:221–254, 1961.
- [17] Jeff Erickson. Finding longest arithmetic progressions, 1999. URL: https://jeffe.cs.illinois.edu/pubs/arith.html.
- [18] Aleksandr Andreevich Evdokimov. Strongly asymmetric sequences generated by a finite number of symbols. Doklady Akademii Nauk SSSR, 179(6):1268–1271, 1968.
- [19] Gabriele Fici, Filippo Mignosi, and Jeffrey O. Shallit. Abelian-square-rich words. Theoretical Computer Science, 684:29–42, 2017. doi:10.1016/j.tcs.2017.02.012.
- [20] Gabriele Fici and Aleksi Saarela. On the minimum number of abelian squares in a word. In Maxime Crochemore, James Currie, Gregory Kucherov, and Dirk Nowotka, editors, Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111), volume 4 (3), pages 34–35, Dagstuhl, Germany, 2014. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. doi:10.4230/DagRep.4.3.28.
- [21] Aviezri S. Fraenkel, Jamie Simpson, and Mike Paterson. On weak circular squares in binary words. In Alberto Apostolico and Jotun Hein, editors, Combinatorial Pattern Matching, 8th Annual Symposium, CPM 97, Aarhus, Denmark, June 30 - July 2, 1997, Proceedings, volume 1264 of Lecture Notes in Computer Science, pages 76–82. Springer, 1997. doi:10.1007/3-540-63220-4_51.
- [22] Allen R. Freedman and Tom C. Brown. Sequences on sets of four numbers. Integers, 16:A33, 2016. URL: http://math.colgate.edu/~integers/q33/q33.Abstract.html.
- [23] Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. How hard is it to find (honest) witnesses? In Piotr Sankowski and Christos D. Zaroliagis, editors, 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, volume 57 of LIPIcs, pages 45:1–45:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ESA.2016.45.
- [24] Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences, 69(4):525–546, 2004. doi:10.1016/j.jcss.2004.03.004.
- [25] Lorenz Halbeisen and Norbert Hungerbühlre. An application of van der Waerden’s theorem in additive number theory. Integers, 0:A7, 2000. URL: http://math.colgate.edu/~integers/a7/a7.pdf.
- [26] Veikko Keränen. A powerful abelian square-free substitution over 4 letters. Theoretical Computer Science, 410(38-40):3893–3900, 2009. doi:10.1016/j.tcs.2009.05.027.
- [27] Veikko Keränen. Abelian squares are avoidable on 4 letters. In Werner Kuich, editor, Automata, Languages and Programming, ICALP 1992, volume 623 of Lecture Notes in Computer Science, pages 41–52. Springer, 1992. doi:10.1007/3-540-55719-9_62.
- [28] Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Maximum number of distinct and nonequivalent nonstandard squares in a word. Theoretical Computer Science, 648:84–95, 2016. doi:10.1016/j.tcs.2016.08.010.
- [29] Tomasz Kociumaka, Jakub Radoszewski, and Bartłomiej Wiśniewski. Subquadratic-time algorithms for abelian stringology problems. In Ilias S. Kotsireas, Siegfried M. Rump, and Chee K. Yap, editors, Mathematical Aspects of Computer and Information Sciences - 6th International Conference, MACIS 2015, Berlin, Germany, November 11-13, 2015, Revised Selected Papers, volume 9582 of Lecture Notes in Computer Science, pages 320–334. Springer, 2015. doi:10.1007/978-3-319-32859-1_27.
- [30] Tomasz Kociumaka, Jakub Radoszewski, and Bartłomiej Wiśniewski. Subquadratic-time algorithms for abelian stringology problems. AIMS Medical Science, 4(3):332–351, 2017. doi:10.3934/ms.2017.3.332.
- [31] Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3SUM conjecture. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1272–1287. SIAM, 2016. doi:10.1137/1.9781611974331.ch89.
- [32] Florian Lietard and Matthieu Rosenfeld. Avoidability of additive cubes over alphabets of four numbers. In Natasa Jonoska and Dmytro Savchuk, editors, Developments in Language Theory - 24th International Conference, DLT 2020, Tampa, FL, USA, May 11-15, 2020, Proceedings, volume 12086 of Lecture Notes in Computer Science, pages 192–206. Springer, 2020. doi:10.1007/978-3-030-48516-0_15.
- [33] Giuseppe Pirillo and Stefano Varricchio. On uniformly repetitive semigroups. Semigroup Forum, 49:125–129, 1994. doi:10.1007/BF02573477.
- [34] Peter A. B. Pleasants. Non-repetitive sequences. Mathematical Proceedings of the Cambridge Philosophical Society, 68:267–274, 1970.
- [35] Mihai Pătra\textcommabelowscu. Towards polynomial lower bounds for dynamic problems. In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 603–610. ACM, 2010. doi:10.1145/1806689.1806772.
- [36] Michaël Rao and Matthieu Rosenfeld. Avoiding two consecutive blocks of same size and same sum over . SIAM Journal on Discrete Mathematics, 32(4):2381–2397, 2018. doi:10.1137/17M1149377.
- [37] Lawrence Bruce Richmond and Jeffrey O. Shallit. Counting abelian squares. Electronic Journal of Combinatorics, 16(1), 2009. URL: http://www.combinatorics.org/Volume_16/Abstracts/v16i1r72.html.
- [38] Raphaël Salem and Donald C. Spencer. On sets of integers which contain no three terms in arithmetical progression. Proceedings of the National Academy of Sciences of the United States of America, 28(12):561–563, 1942. doi:10.1073/pnas.28.12.561.
- [39] Jamie Simpson. Solved and unsolved problems about abelian squares, 2018. arXiv:1802.04481.
- [40] Shiho Sugimoto, Naoki Noda, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing abelian string regularities based on RLE. In Ljiljana Brankovic, Joe Ryan, and William F. Smyth, editors, Combinatorial Algorithms - 28th International Workshop, IWOCA 2017, Newcastle, NSW, Australia, July 17-21, 2017, Revised Selected Papers, volume 10765 of Lecture Notes in Computer Science, pages 420–431. Springer, 2017. doi:10.1007/978-3-319-78825-8_34.