Constructions for Nonadaptive
Tropical Group Testing
Abstract
PCR testing is an invaluable diagnostic tool that has most recently seen widespread use during the COVID-19 pandemic. A recent work by Wang, Gabrys and Vardy proposed tropical codes as a model for group PCR testing. For a known but arbitrary number of infected persons, a sufficient condition on the underlying block design of a zero-error tropical code, called double disjunction, is proposed. Despite this, the parameters for which the construction of doubly disjunct block designs is known to exist are very limited. In this paper, we define probabilistic tropical codes and consider random block designs that are doubly disjunct with high probability. We also provide a deterministic construction for a doubly disjunct block design given a disjunct block design. We show that for certain choices of parameters, our probabilistic construction has vanishing error. Our constructions, combined with existing methods, give us three different ways to construct tropical codes. We compare the number of tests required by each, and bounds on the error.
I Introduction
Polymerase Chain Reaction (PCR) is a method by which genetic material may be rapidly amplified. PCR has a wide range of applications, perhaps most recently in reliable testing for the virus causing COVID-19. PCR testing is typically done through thermal cycling, where the viral load of a test containing some specimens is repeatedly doubled in concentration. Each doubling event is known as a cycle, and the load is doubled until it reaches a detectable threshold or fails to reach this threshold within a preset number of cycles. The cycle number at which this threshold is reached is called the Ct value. The Ct value of a test is thus
and is a measure of the degree of infection of a test, with smaller values indicating higher degrees of infection. If a test fails to reach the threshold after a preset number of cycles, then it is not infected and its Ct-value is .
Consider some finite set of people with an infected subset . Group testing deals with the problem of identifying by performing tests on subsets of whose size may be greater than one. By nonadaptive group testing, we consider a setting where all group tests are given at once. Group testing can be divided into combinatorial and probabilistic group testing. Combinatorial group testing identifies with zero error, while probabilistic group testing identifies with high probability. Traditionally, group testing takes place in the binary setting so that a test on some subset is positive when is nonempty and negative otherwise. On the other hand, a PCR test on a subset returns some aggregate Ct-value, which tells us more than whether or not is nonempty. For instance, when two specimens with Ct values and are placed into a PCR test at the same time, the resulting Ct value is satisfying
Due to exponential decay, we have . In performing a group PCR test, we are also able to add a specimen to a test after a certain number of cycles have elapsed. The operation of adding a specimen to a test after some cycles is known as delaying, and we say that the specimen has been delayed by cycles. When a specimen with Ct value is delayed by cycles, the number of cycles it takes to reach the same detectable threshold becomes .
Based on the above properties of PCR testing, Wang, Gabrys, and Vardy proposed the tropical semiring as a model for group PCR testing [1]. The tropical semiring consists of the real numbers and infinity equipped with the following operations:
The additive identity is and the multiplicative identity is . In this model, for two specimens and (we have identified the specimens with their individual Ct-values), the Ct-value is given by a test containing both and , delayed each by and respectively. In this way, indicates delay and indicates presence. This model works because viral loads vary widely, and the Ct-value of a test with two specimens with differing viral loads is dominated by the specimen with the higher viral load due to the exponential nature of the test. Since Ct values and delays only take nonnegative integer values, all the analysis done in this paper is over the subsemiring .
In general, a test involving specimens with delays will return the Ct-value . In this way, we can define tests using tropical matrix multiplication. Consider a set of individuals, and let be a list of their Ct values. A set of tests on these individuals can be formally written as a matrix of delay values , known as the schedule matrix. If , then the test did not involve the individual. The results of the tests are then given by , where here, denotes tropical matrix multiplication; that is,
If, for any distinct with at most finite values, , then our schedule matrix is a valid -tropical code in the combinatorial setting. The code is said to have maximum delay , where is the largest finite element in . In a real-world setting, is typically held at 40 cycles. For general , zero-error tropical codes are hard to construct.
Existing work in nonadaptive group testing considers the construction of binary matrices satisfying certain properties, such as disjunction and separability, that correspond to binary group testing schemes. These properties are necessary, but not sufficient, for binary matrices that correspond to the schedule matrix of a tropical code. Furthermore, the sufficient doubly disjunct property given in definition 1 by [1] is new, and to our knowledge, existing constructions do not exist for doubly disjunct matrices. In this paper, we propose the construction of random schedule matrices that are doubly disjunct with high probability, as well as the construction of doubly disjunct matrices from disjunct matrices.
Notation. Throughout we use calligraphic letters such as to refer to sets and block designs. We use boldfaced letters for random matrices and random vectors when they are capitalized and in lowercase, respectively. refers to the probability of the event , and we define . Plain capital letters are used for matrices and blocks, and is used for vectors.
II Summary of Existing Results
This paper relies heavily on the results given by [1], which was the first paper to consider using the tropical semiring as a model for group PCR testing, and produced results on tropical codes in the combinatorial setting.
For some , a block design on vertices is a subset of the power set satisfying certain properties with respect to unions, intersections, and differences. Each element is called a block. The set operation properties satisfied by the blocks vary between block designs. An example of a block design on vertices is the Fano plane . Note that the intersection of any two blocks has exactly one element, and that each block has size three. Furthermore, each element of is contained in exactly three blocks.
We may associate a block design on vertices with with an incidence matrix , a binary matrix where each column corresponds to a block. Then
Note that this representation is unique up to permutation of the columns , whereas the permutation of the rows of results in an equivalent block design. For instance, the incidence matrix of the previously described Fano plane is
We may use the incidence matrices of suitable block designs to construct tropical codes by replacing each with and each with a suitable finite delay value. Conversely, every tropical code has an associated block design.
Definition 1 (Doubly Disjunct Block Design).
A block design with blocks on vertices is said to be -doubly disjunct if, for any distinct blocks ,
If instead , then we call -disjunct.
Remark.
Clearly if is a -doubly disjunct block design, it is also -disjunct.
The Fano plane is a 1-doubly disjunct block design on vertices. A construction for -doubly disjunct block design on some arbitrary number of vertices is given by Graham and Sloane [2]. Let be the set of all binary vectors of length and Hamming weight , consider the map
taking a binary vector to the sum of the indices of its ones, modulo . Then the preimage of any residue class will be a -doubly disjunct block design. At least one of these preimages will have cardinality larger than .
Theorem 1 (Theorem 31 in [1]).
Let be a -doubly-disjunct block design with blocks on vertices. Let be the incidence matrix of . Then there exists a -tropical code whose associated block design is . That is, the finite entries of the schedule matrix coincide exactly with the nonzero entries of .
In the case where and each block has size at least three, there exists a decoding algorithm, given by Theorem 17 in [1], whose maximum delay is a prime bounded above by . Otherwise, for , the decoding algorithm given in Theorem 31 in [1] uses a maximum delay of , with the schedule matrix
although the authors conjecture that there is a polynomial delay code using the same underlying block design. The given decoding algorithm considers sets of potentially infected persons of size . We may construct a bipartite graph using a plausible set of infected persons and the actual set of infected persons as our two sets of vertices. The double disjunctness of guarantees that this graph contains an even cycle. The alternating sum along this cycle will correspond to the sum of delays for the actual set of infected persons.
To the best of our knowledge, there is no construction for a -doubly disjunct block design for with arbitrary parameters. Existing constructions for -disjunct matrices, which are suitable for binary group testing, exist. Combinatorial constructions exist with [3], while Monte-Carlo constructions, like our own, exist for [4]. Probabilistic binary group testing schemes exist with [5].
III Result I: A Deterministic Construction from Disjunct Block Designs
Recall from Definition 1 what it means for a block design to be disjunct and doubly disjunct. Our first result provides a simple way to construct -doubly disjunct block designs from -disjunct block designs by doubling the number of vertices. These can then be used to produce -tropical codes by Theorem 1. Methods for constructing -disjunct block designs are given in [5, 4, 3].
Theorem 2.
Let be a -disjunct block design on vertices. Then the image of the function
is a -doubly disjunct block design on vertices.
Proof:
Let be any distinct blocks in , and for each let be the unique element of mapping to . Since is disjunct, . For each block, the maps and are injective, and the images of the two maps are disjoint. Let . Then note that and . If , then for some , but this would imply that for some , which cannot be the case. Similarly, if , then for some , but this would imply that for some , which cannot be the case. Hence, and , so and is -doubly disjunct. ∎
Remark.
This construction extends to the construction of -fold disjunct block designs, but it is not optimal. For instance, consider the Fano plane as described in section II. The Fano plane is an “optimal” 2-disjunctive block design on 7 vertices in the sense that no blocks can be added to it while preserving 2-disjunctiveness, but applying Theorem 2 to the Fano plane and adding the block still yields a 2-doubly disjunct block design on 14 vertices.
IV Result II: A Probabilistic Construction
In this section we define tropical codes in a probabilistic setting, and construct random matrices that are doubly disjunct with high probability.
IV-A Definition
Consider a random schedule matrix . We define the probability of error as
where . In other words, is the probability that a group testing scheme with schedule matrix confuses and . To formalize the definition of a tropical code in a probabilistic setting, we introduce a parameter to bound the probability of error .
Definition 2.
A random schedule matrix is a -tropical code if .
A -tropical code then corresponds to the definition of a -tropical code given in [1]. This definition naturally motivates us to find relationships between the parameters and .
IV-B Construction
In this section we find parameters for which we are able to construct doubly disjunct matrices with high probability, then compute for our construction, based on its probability of generating a matrix that is not doubly disjunct.
We show the existence of a joint distribution for the first row of delays in . We then generate the remaining rows of i.i.d. according to the same distribution .
We produce sufficient properties on for to be doubly disjunct with high probability. Since is a probability distribution, we require
(1) |
and
(2) |
For to be -doubly disjunct on average, we require, for every distinct
(3) |
Additionally, to ensure that no test is empty, we set . Conditions (1), (2), and (3) reduce our problem to proving the existence of non-negative solutions to a linear system. To simplify our analysis, we restrict to the case where rows whose Hamming weights are equal have equal probability of being generated. For instance, if , then we let and . This allows us to write our first conditions (1) and (3) as linear combinations of the probabilities that has hamming weight , rather than the probabilities of for each . Let be the vector whose element corresponds to the probability that has hamming weight . Conditions (1) and (3) can then be written as the following linear system, for which is a solution
(4) |
The first row of the matrix in equation (4) corresponds to the condition given by (1); its component is the number of length binary vectors with weight . The second row corresponds to the condition given by (3); its component is the number of weight length binary vectors with and for any distinct .
The following lemma provides conditions on the parameters such that an elementwise nonnegative solution to the linear system (4) exists.
Lemma 1.
For a general pool of patients with a maximum of infected, and any , if
then equation (4) has an elementwise nonnegative solution.
We will use Farkas’ Lemma to establish Lemma 1.
Lemma 2 (Farkas’ Lemma, [6, p. 263]).
Let and . Then exactly one of the following is true:
-
1.
There exists such that and elementwise
-
2.
There exists such that elementwise and
Proof:
If no elementwise nonnegative vector exists satisfying (4), then by the second condition of Farkas’ lemma, there exists some such that , , and for all . Together, these imply
for all . Note that, since and , we must have . This means
We compute the maximum of over . Note that
so that
We examine the terms outside of the products in the numerator and denominator, which differ between and . For , we have
If is an integer, then so that
and . If is an integer, then so that
and . As such, is maximized when . Hence, if
then there exists some elementwise non-negative vector satisfying (4). Thus, there exists a valid probability distribution on , using which we can generate a -tropical code with high probability. ∎
IV-C Performance of the probabilistic construction
In this section, we establish conditions for which the proposed construction provides us with a valid tropical code.
Theorem 3.
The construction given in Section IV-B is a -tropical code if
We will use Chernoff bound to establish Theorem 3.
Lemma 3 (Chernoff bound [7]).
Let where all are independent. Let . Then
for all
Proof:
For a given , choose . For some fixed distinct blocks indexed let be the random variable indicating whether or not the entry of the row is and the entries of the row are 0. Then the probability of error for the chosen subset of blocks is , since the sum gives the size of the set . Then note that
We may then use the Chernoff bound (lemma 3) with and to get:
Now we take the union bound over every subset of with cardinality with one specific index chosen. This gives an upper bound for .
To study the asymptotic behaviour of the construction in Lemma 1, we reduce our final expression using the following asymptotic approximation of the binomial coefficient
where is the binary entropy function. This gives
so that our construction is a -tropical code if
(5) |
Using the bounds and we can relax the condition in (5) so that our construction is a -tropical code if
(6) |
∎
V Comparison
Since both probabilistic and deterministic constructions exist for disjunct matrices, and our result in Theorem 1 provides a construction for a -doubly disjunct block design from a -disjunct block design, we have three ways to produce -tropical codes for some fixed :
-
1.
First construct a deterministic -disjunct matrix, then use Theorem 2 to construct a -doubly disjunct matrix;
-
2.
Construct a -disjunct matrix probabilistically, then use Theorem 2 to construct a -doubly disjunct matrix;
-
3.
Construct a -doubly disjunct random matrix directly using Lemma 1.
Theorem 2 doubles the number of tests used, but does not change the asymptotic behaviour of the construction used for a -disjunct matrix. The deterministic constructions given by [8, 3] have with for specific parameter choices. As such, the first method also gives with . For the second method, we can use our construction detailed in Section IV-B to construct -disjunct matrices by replacing the right hand side of Lemma 1 and all subsequent calculations with . This gives -disjunct matrices with for error bound . Then, using Theorem 2, we can get a -doubly disjunct matrix with , which is greater than the bound in Theorem 3 for the construction of a -doubly disjunct matrix directly, as in the third method. From equation (5) in Theorem 3, the construction given in Section IV-B requires . When is sublinear, then we have . If , then .
In a practical setting, if the situation requires zero error, then we can use the first method to construct a doubly disjunct matrix with the set of given parameters. Otherwise, if some error is tolerable, then our construction in Section IV-B may be preferable.
VI Conclusion
In this paper we consider a probabilistic construction for a -doubly disjunct block design, as well as a deterministic method based on the existence of a -disjunct block design. We show that constructing a -doubly-disjunct block design probabilistically gives a similar performance to a fully deterministic construction based on a -disjunct block design, with the added benefit of being able to use arbitrary parameters.
References
- [1] H.-P. Wang, R. Gabrys, and A. Vardy, “Tropical group testing,” 2022. [Online]. Available: https://arxiv.org/abs/2201.05440
- [2] R. Graham and N. Sloane, “Lower bounds for constant weight codes,” IEEE Transactions on Information Theory, vol. 26, no. 1, pp. 37–43, 1980.
- [3] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 363–377, 1964.
- [4] M. Cheraghchi and V. Nakos, “Combinatorial group testing and sparse recovery schemes with near-optimal decoding time,” CoRR, vol. abs/2006.08420, 2020. [Online]. Available: https://arxiv.org/abs/2006.08420
- [5] H. A. Inan, P. Kairouz, M. Wootters, and A. Özgür, “On the optimality of the kautz-singleton construction in probabilistic group testing,” IEEE Trans. Inf. Theor., vol. 65, no. 9, p. 5592–5603, sep 2019. [Online]. Available: https://doi.org/10.1109/TIT.2019.2902397
- [6] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
- [7] M. Mitzenmacher and E. Upfal, Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
- [8] E. Porat and A. Rothschild, “Explicit non-adaptive combinatorial group testing schemes,” CoRR, vol. abs/0712.3876, 2007. [Online]. Available: http://arxiv.org/abs/0712.3876