This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Constructions for Nonadaptive
Tropical Group Testing

Nicholas Kwan and Lele Wang University of British Columbia, Vancouver, BC V6T1Z4, Canada
nickwan@student.ubc.ca, lelewang@ece.ubc.ca
Abstract

PCR testing is an invaluable diagnostic tool that has most recently seen widespread use during the COVID-19 pandemic. A recent work by Wang, Gabrys and Vardy proposed tropical codes as a model for group PCR testing. For a known but arbitrary number of infected persons, a sufficient condition on the underlying block design of a zero-error tropical code, called double disjunction, is proposed. Despite this, the parameters for which the construction of doubly disjunct block designs is known to exist are very limited. In this paper, we define probabilistic tropical codes and consider random block designs that are doubly disjunct with high probability. We also provide a deterministic construction for a doubly disjunct block design given a disjunct block design. We show that for certain choices of parameters, our probabilistic construction has vanishing error. Our constructions, combined with existing methods, give us three different ways to construct tropical codes. We compare the number of tests required by each, and bounds on the error.

I Introduction

Polymerase Chain Reaction (PCR) is a method by which genetic material may be rapidly amplified. PCR has a wide range of applications, perhaps most recently in reliable testing for the virus causing COVID-19. PCR testing is typically done through thermal cycling, where the viral load of a test containing some specimens is repeatedly doubled in concentration. Each doubling event is known as a cycle, and the load is doubled until it reaches a detectable threshold or fails to reach this threshold within a preset number of cycles. The cycle number at which this threshold is reached is called the Ct value. The Ct value of a test is thus

Ct value=log2(viral load)+constant,\text{Ct value}=\lfloor-\log_{2}(\text{viral load})+\text{constant}\rfloor,

and is a measure of the degree of infection of a test, with smaller values indicating higher degrees of infection. If a test fails to reach the threshold after a preset number of cycles, then it is not infected and its Ct-value is \infty.

Consider some finite set 𝒮\mathcal{S} of people with an infected subset 𝒟𝒮\mathcal{D}\subset\mathcal{S}. Group testing deals with the problem of identifying 𝒟\mathcal{D} by performing tests on subsets of 𝒮\mathcal{S} whose size may be greater than one. By nonadaptive group testing, we consider a setting where all group tests are given at once. Group testing can be divided into combinatorial and probabilistic group testing. Combinatorial group testing identifies 𝒟\mathcal{D} with zero error, while probabilistic group testing identifies 𝒟\mathcal{D} with high probability. Traditionally, group testing takes place in the binary setting so that a test on some subset 𝒰𝒮\mathcal{U}\subset\mathcal{S} is positive when 𝒰𝒟\mathcal{U}\cap\mathcal{D} is nonempty and negative otherwise. On the other hand, a PCR test on a subset 𝒰\mathcal{U} returns some aggregate Ct-value, which tells us more than whether or not 𝒰𝒟\mathcal{U}\cap\mathcal{D} is nonempty. For instance, when two specimens with Ct values xx and yy are placed into a PCR test at the same time, the resulting Ct value is zz satisfying

2x+2y=2z.2^{-x}+2^{-y}=2^{-z}.

Due to exponential decay, we have zmin{x,y}z\approx\min\{x,y\}. In performing a group PCR test, we are also able to add a specimen to a test after a certain number of cycles have elapsed. The operation of adding a specimen to a test after some δ\delta cycles is known as delaying, and we say that the specimen has been delayed by δ\delta cycles. When a specimen with Ct value xx is delayed by δ\delta cycles, the number of cycles it takes to reach the same detectable threshold becomes x+δx+\delta.

Based on the above properties of PCR testing, Wang, Gabrys, and Vardy proposed the tropical semiring as a model for group PCR testing [1]. The tropical semiring ({},,)(\mathbb{R}\cup\{\infty\},\oplus,\odot) consists of the real numbers and infinity equipped with the following operations:

xy\displaystyle x\oplus y :=min(x,y),\displaystyle:=\min(x,y),
xy\displaystyle x\odot y :=x+y.\displaystyle:=x+y.

The additive identity is \infty and the multiplicative identity is 0. In this model, for two specimens xx and yy (we have identified the specimens with their individual Ct-values), the Ct-value (xδx)(yδy)=min(x+δx,y+δy)(x\odot\delta_{x})\oplus(y\odot\delta_{y})=\min(x+\delta_{x},y+\delta_{y}) is given by a test containing both xx and yy, delayed each by δx\delta_{x} and δy\delta_{y} respectively. In this way, \odot indicates delay and \oplus indicates presence. This model works because viral loads vary widely, and the Ct-value of a test with two specimens with differing viral loads is dominated by the specimen with the higher viral load due to the exponential nature of the test. Since Ct values and delays only take nonnegative integer values, all the analysis done in this paper is over the subsemiring ({0,},,)(\mathbb{N}\cup\{0,\infty\},\oplus,\odot).

In general, a test involving specimens x1,,xkx_{1},\ldots,x_{k} with delays δ1,,δk\delta_{1},\ldots,\delta_{k} will return the Ct-value i=1k(xiδi)\bigoplus_{i=1}^{k}(x_{i}\odot\delta_{i}). In this way, we can define tests using tropical matrix multiplication. Consider a set of nn individuals, and let x=[x1,x2,,xn]({0,})nx=[x_{1},x_{2},\ldots,x_{n}]^{\intercal}\in(\mathbb{N}\cup\{0,\infty\})^{n} be a list of their Ct values. A set of tt tests on these individuals can be formally written as a matrix of delay values S({0,})t×nS\in(\mathbb{N}\cup\{0,\infty\})^{t\times n}, known as the schedule matrix. If Sij=S_{ij}=\infty, then the ithi^{\text{th}} test did not involve the jthj^{\text{th}} individual. The results of the tests are then given by SxS\odot x, where here, \odot denotes tropical matrix multiplication; that is,

Sx=[j=1N(S1jxj)j=1N(Stjxj)].S\odot x=\begin{bmatrix}\bigoplus_{j=1}^{N}(S_{1j}\odot x_{j})\\ \vdots\\ \bigoplus_{j=1}^{N}(S_{tj}\odot x_{j})\end{bmatrix}.

If, for any distinct x,y({0,})nx,y\in(\mathbb{N}\cup\{0,\infty\})^{n} with at most dd finite values, SxSyS\odot x\neq S\odot y, then our schedule matrix is a valid (t,n,d)(t,n,d)-tropical code in the combinatorial setting. The code is said to have maximum delay ll, where ll is the largest finite element in SS. In a real-world setting, ll is typically held at 40 cycles. For general dd, zero-error tropical codes are hard to construct.

Existing work in nonadaptive group testing considers the construction of binary matrices satisfying certain properties, such as disjunction and separability, that correspond to binary group testing schemes. These properties are necessary, but not sufficient, for binary matrices that correspond to the schedule matrix of a tropical code. Furthermore, the sufficient doubly disjunct property given in definition 1 by [1] is new, and to our knowledge, existing constructions do not exist for doubly disjunct matrices. In this paper, we propose the construction of random schedule matrices that are doubly disjunct with high probability, as well as the construction of doubly disjunct matrices from disjunct matrices.

Notation. Throughout we use calligraphic letters such as 𝒦\mathcal{K} to refer to sets and block designs. We use boldfaced letters for random matrices 𝑺\bm{S} and random vectors 𝒔\bm{s} when they are capitalized and in lowercase, respectively. P(A)\operatorname{\textsf{P}}(A) refers to the probability of the event AA, and we define pX(x)P(X=x)p_{X}(x)\triangleq\operatorname{\textsf{P}}(X=x). Plain capital letters are used for matrices and blocks, and p\vec{p} is used for vectors.

II Summary of Existing Results

This paper relies heavily on the results given by [1], which was the first paper to consider using the tropical semiring as a model for group PCR testing, and produced results on tropical codes in the combinatorial setting.

For some tt\in\mathbb{N}, a block design on tt vertices is a subset 2[t]\mathcal{F}\subset 2^{[t]} of the power set 2[t]2^{[t]} satisfying certain properties with respect to unions, intersections, and differences. Each element BB\in\mathcal{F} is called a block. The set operation properties satisfied by the blocks vary between block designs. An example of a block design on 77 vertices is the Fano plane ={{1,2,4},{1,3,7},{1,5,6},{2,3,5},{2,6,7},{3,4,6},\mathcal{F}=\{\{1,2,4\},\{1,3,7\},\{1,5,6\},\{2,3,5\},\{2,6,7\},\{3,4,6\}, {4,5,7}}\{4,5,7\}\}. Note that the intersection of any two blocks has exactly one element, and that each block has size three. Furthermore, each element of [7][7] is contained in exactly three blocks.

We may associate a block design \mathcal{F} on TT vertices with ||=N|\mathcal{F}|=N with an incidence matrix MM, a T×NT\times N binary matrix where each column corresponds to a block. Then

Mij={0if iBj for the block Bj,1if iBj for the block Bj.M_{ij}=\begin{cases}0&\text{if $i\notin B_{j}$ for the block $B_{j}\in\mathcal{F},$}\\ 1&\text{if $i\in B_{j}$ for the block $B_{j}\in\mathcal{F}.$}\end{cases}

Note that this representation is unique up to permutation of the columns MM, whereas the permutation of the rows of MM results in an equivalent block design. For instance, the incidence matrix of the previously described Fano plane is

[1000101110001001100011011000010110000101100001011].\begin{bmatrix}1&0&0&0&1&0&1\\ 1&1&0&0&0&1&0\\ 0&1&1&0&0&0&1\\ 1&0&1&1&0&0&0\\ 0&1&0&1&1&0&0\\ 0&0&1&0&1&1&0\\ 0&0&0&1&0&1&1\end{bmatrix}.

We may use the incidence matrices of suitable block designs to construct tropical codes by replacing each 0 with \infty and each 11 with a suitable finite delay value. Conversely, every tropical code has an associated block design.

Definition 1 (Doubly Disjunct Block Design).

A block design \mathcal{F} with nn blocks on tt vertices is said to be dd-doubly disjunct if, for any distinct blocks Z,B1,,BdZ,B_{1},\ldots,B_{d}\in\mathcal{F}, |Z(i=1dBi)|2.\left|Z\setminus\left(\bigcup_{i=1}^{d}B_{i}\right)\right|\geq 2.

If instead |Z(i=1dBi)|1\left|Z\setminus\left(\bigcup_{i=1}^{d}B_{i}\right)\right|\geq 1, then we call \mathcal{F} dd-disjunct.

Remark.

Clearly if \mathcal{F} is a dd-doubly disjunct block design, it is also dd-disjunct.

The Fano plane is a 1-doubly disjunct block design on 77 vertices. A construction for 11-doubly disjunct block design on some arbitrary number of vertices tt is given by Graham and Sloane [2]. Let 𝒮wt\mathcal{S}^{t}_{w} be the set of all binary vectors of length tt and Hamming weight ww, consider the map

f:𝒮wt\displaystyle f:\mathcal{S}_{w}^{t} /t,\displaystyle\longrightarrow\mathbb{Z}/t\mathbb{Z},
v\displaystyle\vec{v} (i=1nivi)modt,\displaystyle\longmapsto\left(\sum_{i=1}^{n}iv_{i}\right)\mod t,

taking a binary vector vv to the sum of the indices of its ones, modulo tt. Then the preimage of any residue class will be a 11-doubly disjunct block design. At least one of these preimages will have cardinality larger than 1t(tw)\frac{1}{t}\binom{t}{w}.

Theorem 1 (Theorem 31 in [1]).

Let \mathcal{F} be a (d1)(d-1)-doubly-disjunct block design with nn blocks on tt vertices. Let MM be the incidence matrix of \mathcal{F}. Then there exists a (t,n,d)(t,n,d)-tropical code whose associated block design is \mathcal{F}. That is, the finite entries of the schedule matrix SS coincide exactly with the nonzero entries of MM.

In the case where d=2d=2 and each block BB\in\mathcal{F} has size at least three, there exists a decoding algorithm, given by Theorem 17 in [1], whose maximum delay is a prime bounded above by nn. Otherwise, for d>2d>2, the decoding algorithm given in Theorem 31 in [1] uses a maximum delay of 2n(t+1)2^{n(t+1)}, with the schedule matrix

Sij={2i+jt, if Mij=1,, if Mij=0.S_{ij}=\begin{cases}2^{i+jt},&\text{ if }M_{ij}=1,\\ \infty,&\text{ if }M_{ij}=0.\end{cases}

although the authors conjecture that there is a polynomial delay code using the same underlying block design. The given decoding algorithm considers sets of potentially infected persons of size dd. We may construct a bipartite graph using a plausible set of dd infected persons and the actual set of dd infected persons as our two sets of vertices. The double disjunctness of \mathcal{F} guarantees that this graph contains an even cycle. The alternating sum along this cycle will correspond to the sum of delays for the actual set of infected persons.

To the best of our knowledge, there is no construction for a (d1)(d-1)-doubly disjunct block design for d>2d>2 with arbitrary parameters. Existing constructions for dd-disjunct matrices, which are suitable for binary group testing, exist. Combinatorial constructions exist with t=O(d2logn)t=O(d^{2}\log n) [3], while Monte-Carlo constructions, like our own, exist for O(d2min{logn,(logtn)2})O\left(d^{2}\min\{\log n,(\log_{t}n)^{2}\}\right) [4]. Probabilistic binary group testing schemes exist with t=Θ(dlogn)t=\Theta(d\log n) [5].

III Result I: A Deterministic Construction from Disjunct Block Designs

Recall from Definition 1 what it means for a block design \mathcal{F} to be disjunct and doubly disjunct. Our first result provides a simple way to construct dd-doubly disjunct block designs from dd-disjunct block designs by doubling the number of vertices. These can then be used to produce (t,n,d)(t,n,d)-tropical codes by Theorem 1. Methods for constructing dd-disjunct block designs are given in [5, 4, 3].

Theorem 2.

Let 2[t]\mathcal{F}\subset 2^{[t]} be a dd-disjunct block design on tt vertices. Then the image 𝒢\mathcal{G} of the function

f:\displaystyle f:\mathcal{F} 2[2t]\displaystyle\longrightarrow 2^{[2t]}
B\displaystyle B {2x:xB}{2x1:xB},\displaystyle\longmapsto\{2x:x\in B\}\cup\{2x-1:x\in B\},

is a dd-doubly disjunct block design on 2t2t vertices.

Proof:

Let B0,,BdB_{0},\ldots,B_{d} be any d+1d+1 distinct blocks in 𝒢\mathcal{G}, and for each i{0,,d}i\in\{0,\ldots,d\} let AiA_{i} be the unique element of \mathcal{F} mapping to BiB_{i}. Since \mathcal{F} is disjunct, |A0(i=1dAi)|1\left|A_{0}\setminus\left(\bigcup_{i=1}^{d}A_{i}\right)\right|\geq 1. For each block, the maps x2xx\mapsto 2x and x2x1x\mapsto 2x-1 are injective, and the images of the two maps are disjoint. Let aA0(i=1dAi)a\in A_{0}\setminus\left(\bigcup_{i=1}^{d}A_{i}\right). Then note that 2aB02a\in B_{0} and 2a1B02a-1\in B_{0}. If 2ai=1dBi2a\in\bigcup_{i=1}^{d}B_{i}, then 2aBi2a\in B_{i} for some i1i\geq 1, but this would imply that aAia\in A_{i} for some i1i\geq 1, which cannot be the case. Similarly, if 2a1i=1dBi2a-1\in\bigcup_{i=1}^{d}B_{i}, then 2a1Bi2a-1\in B_{i} for some i1i\geq 1, but this would imply that aAia\in A_{i} for some i1i\geq 1, which cannot be the case. Hence, 2aB0(i=1dBi)2a\in B_{0}\setminus\left(\bigcup_{i=1}^{d}B_{i}\right) and 2a1B0(i=1dBi)2a-1\in B_{0}\setminus\left(\bigcup_{i=1}^{d}B_{i}\right), so |B0(i=1dBi)|2\left|B_{0}\setminus\left(\bigcup_{i=1}^{d}B_{i}\right)\right|\geq 2 and 𝒢\mathcal{G} is dd-doubly disjunct. ∎

Remark.

This construction extends to the construction of nn-fold disjunct block designs, but it is not optimal. For instance, consider the Fano plane as described in section II. The Fano plane is an “optimal” 2-disjunctive block design on 7 vertices in the sense that no blocks can be added to it while preserving 2-disjunctiveness, but applying Theorem 2 to the Fano plane and adding the block {2,4,6,8,10,12,14}\{2,4,6,8,10,12,14\} still yields a 2-doubly disjunct block design on 14 vertices.

IV Result II: A Probabilistic Construction

In this section we define tropical codes in a probabilistic setting, and construct random matrices that are doubly disjunct with high probability.

IV-A Definition

Consider a random t×nt\times n schedule matrix 𝐒\mathbf{S}. We define the probability of error as

Pen,d(𝑺)P(𝑺x=𝑺y for some distinct x,yn,d),\displaystyle P_{e}^{n,d}(\bm{S})\triangleq\operatorname{\textsf{P}}\left({\bm{S}}\odot\vec{x}={\bm{S}}\odot\vec{y}\text{ for some distinct }\vec{x},\vec{y}\in\mathcal{R}_{n,d}\right),

where n,d{x({0,})n:x contains at most d
finite values
}
\mathcal{R}_{n,d}\triangleq\{\vec{x}\in(\mathbb{N}\cup\{0,\infty\})^{n}:x\text{ contains at most $d$}\\ \text{finite values}\}
. In other words, Pen,d(𝑺)P_{e}^{n,d}(\bm{S}) is the probability that a group testing scheme with schedule matrix 𝑺\bm{S} confuses xx and yy. To formalize the definition of a tropical code in a probabilistic setting, we introduce a parameter ϵ\epsilon to bound the probability of error Pen,d(𝑺)P_{e}^{n,d}(\bm{S}).

Definition 2.

A random t×nt\times n schedule matrix 𝑺\bm{S} is a (t,n,d,ϵ)(t,n,d,\epsilon)-tropical code if Pen,dϵP_{e}^{n,d}\leq\epsilon.

A (t,n,d,0)(t,n,d,0)-tropical code then corresponds to the definition of a (t,n,d)(t,n,d)-tropical code given in [1]. This definition naturally motivates us to find relationships between the parameters t,n,dt,n,d and ϵ\epsilon.

IV-B Construction

In this section we find parameters (t,n,d)(t,n,d) for which we are able to construct doubly disjunct matrices with high probability, then compute ϵ\epsilon for our construction, based on its probability of generating a matrix that is not doubly disjunct.

We show the existence of a joint distribution p𝒔(v)p_{\bm{s}}(\vec{v}) for the first row of delays 𝒔\bm{s} in 𝑺\bm{S}. We then generate the remaining rows of 𝑺\bm{S} i.i.d. according to the same distribution p𝒔(v)p_{\bm{s}}(\vec{v}).

We produce sufficient properties on p𝒔(v)p_{\bm{s}}(\vec{v}) for 𝑺\bm{S} to be doubly disjunct with high probability. Since p𝒔(v)p_{\bm{s}}(\vec{v}) is a probability distribution, we require

v{0,1}np𝒔(v)=1,\sum_{\vec{v}\in\{0,1\}^{n}}p_{\bm{s}}(\vec{v})=1, (1)

and

p𝒔(v)0 for all v{0,1}n.p_{\bm{s}}(\vec{v})\geq 0\text{ for all }\vec{v}\in\{0,1\}^{n}. (2)

For 𝑺\bm{S} to be d1d-1-doubly disjunct on average, we require, for every distinct i1,i2,,id[n]i_{1},i_{2},\ldots,i_{d}\in[n]

P(𝒔i1=1 and𝒔i2==𝒔id=0)=v{0,1}nvi1=1,vi2==vid=0p𝒔(v)=2+Δt.\displaystyle\operatorname{\textsf{P}}\begin{pmatrix}\bm{s}_{i_{1}}=1\text{ and}\\ \bm{s}_{i_{2}}=\cdots=\bm{s}_{i_{d}}=0\end{pmatrix}=\sum_{\begin{subarray}{c}\vec{v}\in\{0,1\}^{n}\\ v_{i_{1}}=1,\\ v_{i_{2}}=\cdots=v_{i_{d}}=0\end{subarray}}p_{\bm{s}}(\vec{v})=\frac{2+\Delta}{t}. (3)

Additionally, to ensure that no test is empty, we set P(𝒔=0)=0P(\bm{s}=\vec{0})=0. Conditions  (1), (2), and (3) reduce our problem to proving the existence of non-negative solutions to a linear system. To simplify our analysis, we restrict to the case where rows whose Hamming weights are equal have equal probability of being generated. For instance, if n=3n=3, then we let p𝒔(100)=p𝒔(010)=p𝒔(001)p_{\bm{s}}(100)=p_{\bm{s}}(010)=p_{\bm{s}}(001) and p𝒔(110)=p𝒔(101)=p𝒔(011)p_{\bm{s}}(110)=p_{\bm{s}}(101)=p_{\bm{s}}(011). This allows us to write our first conditions (1) and (3) as linear combinations of the probabilities that 𝒔\bm{s} has hamming weight wnw\leq n, rather than the probabilities of p𝒔(v)p_{\bm{s}}(\vec{v}) for each v{0,1}n\vec{v}\in\{0,1\}^{n}. Let p\vec{p} be the vector whose wthw^{\text{th}} element corresponds to the probability that s\vec{s} has hamming weight ww. Conditions (1) and (3) can then be written as the following linear system, for which p\vec{p} is a solution

[(n1)(n2)(nnd+1)(nn)(nd0)(nd1)(ndnd)00]p=[12+Δt].\begin{bmatrix}\binom{n}{1}&\binom{n}{2}&\cdots&\binom{n}{n-d+1}&\;\;\;\cdots&\binom{n}{n}\\[5.0pt] \binom{n-d}{0}&\binom{n-d}{1}&\cdots&\binom{n-d}{n-d}&0\;\;\cdots&0\end{bmatrix}\vec{p}=\begin{bmatrix}1\\ \frac{2+\Delta}{t}\end{bmatrix}. (4)

The first row of the matrix in equation (4) corresponds to the condition given by (1); its wthw^{\text{th}} component is the number of length nn binary vectors with weight ww. The second row corresponds to the condition given by (3); its wthw^{\text{th}} component is the number of weight ww length nn binary vectors 𝒔\bm{s} with 𝒔i1=1\bm{s}_{i_{1}}=1 and 𝒔i2==𝒔id=0\bm{s}_{i_{2}}=\ldots=\bm{s}_{i_{d}}=0 for any distinct i1,,id[n]i_{1},\ldots,i_{d}\in[n].

The following lemma provides conditions on the parameters such that an elementwise nonnegative solution to the linear system (4) exists.

Lemma 1.

For a general pool of nn patients with a maximum of dd infected, and any Δ>0\Delta>0, if

(ndnd1)(nnd)>2+Δt,\frac{\binom{n-d}{\lfloor\frac{n}{d}\rfloor-1}}{\binom{n}{\lfloor\frac{n}{d}\rfloor}}>\frac{2+\Delta}{t},

then equation (4) has an elementwise nonnegative solution.

We will use Farkas’ Lemma to establish Lemma 1.

Lemma 2 (Farkas’ Lemma, [6, p. 263]).

Let Am×nA\in\mathbb{R}^{m\times n} and bm\vec{b}\in\mathbb{R}^{m}. Then exactly one of the following is true:

  1. 1.

    There exists pn\vec{p}\in\mathbb{R}^{n} such that Ap=bA\vec{p}=\vec{b} and p0\vec{p}\geq 0 elementwise

  2. 2.

    There exists ym\vec{y}\in\mathbb{R}^{m} such that Ay0A^{\intercal}\vec{y}\geq 0 elementwise and by<0\vec{b}^{\intercal}\vec{y}<0

Proof:

If no elementwise nonnegative vector p\vec{p} exists satisfying (4), then by the second condition of Farkas’ lemma, there exists some y=[y1y2]\vec{y}=[y_{1}\;y_{2}]^{\intercal} such that y10y_{1}\geq 0, y1+2+Δty2<0y_{1}+\frac{2+\Delta}{t}y_{2}<0, and (nw)y1+(ndw1)y20\binom{n}{w}y_{1}+\binom{n-d}{w-1}y_{2}\geq 0 for all w=1,2,,nd+1w=1,2,\ldots,n-d+1. Together, these imply

0\displaystyle 0 >y1+2+Δty2\displaystyle>y_{1}+\frac{2+\Delta}{t}y_{2}
(ndw1)y2(nw)+2+Δty2\displaystyle\geq\frac{-\binom{n-d}{w-1}y_{2}}{\binom{n}{w}}+\frac{2+\Delta}{t}y_{2}
=y2(2+Δt(ndw1)(nw)),\displaystyle=y_{2}\left(\frac{2+\Delta}{t}-\frac{\binom{n-d}{w-1}}{\binom{n}{w}}\right),

for all w=1,2,,nd+1w=1,2,\ldots,n-d+1. Note that, since y10y_{1}\geq 0 and y1+2+Δty2<0y_{1}+\frac{2+\Delta}{t}y_{2}<0, we must have y2<0y_{2}<0. This means

(ndw1)(nw)<2+Δt.\frac{\binom{n-d}{w-1}}{\binom{n}{w}}<\frac{2+\Delta}{t}.

We compute the maximum of f(w)(ndw1)(nw)f(w)\triangleq\frac{\binom{n-d}{w-1}}{\binom{n}{w}} over w[d]w\in[d]. Note that

f(w)\displaystyle f(w) =wi=0w2(ndi)i=0w1(ni),\displaystyle=\frac{w\prod_{i=0}^{w-2}(n-d-i)}{\prod_{i=0}^{w-1}(n-i)},

so that

f(w+1)\displaystyle f(w+1) =(w+1)(ndw+1)i=0w2(ndi)(nw)i=0w1(ni).\displaystyle=\frac{(w+1)(n-d-w+1)\prod_{i=0}^{w-2}(n-d-i)}{(n-w)\prod_{i=0}^{w-1}(n-i)}.

We examine the terms outside of the products in the numerator and denominator, which differ between f(w)f(w) and f(w+1)f(w+1). For f(w+1)f(w+1), we have

(w+1)(ndw+1)nw\displaystyle\frac{(w+1)(n-d-w+1)}{n-w} =w(1+1w)(1d1nw).\displaystyle=w\left(1+\frac{1}{w}\right)\left(1-\frac{d-1}{n-w}\right).

If wn/dw\geq\lfloor n/d\rfloor is an integer, then wdnwd\geq n so that

w(d1)\displaystyle w(d-1) nw\displaystyle\geq n-w
d1nw\displaystyle\frac{d-1}{n-w} 1w,\displaystyle\geq\frac{1}{w},

and f(w+1)f(w)f(w+1)\leq f(w). If w<n/dw<\lfloor n/d\rfloor is an integer, then wnd1w\leq\frac{n}{d}-1 so that

nw\displaystyle n-w d(w+1)w\displaystyle\geq d(w+1)-w
>(d1)(w+1)\displaystyle>(d-1)(w+1)
1w\displaystyle\frac{1}{w} >(d1nw)(1+1w),\displaystyle>\left(\frac{d-1}{n-w}\right)\left(1+\frac{1}{w}\right),

and f(w+1)>f(w)f(w+1)>f(w). As such, ff is maximized when w=n/dw=\lfloor n/d\rfloor. Hence, if

(ndnd1)(nnd)>2+Δt,\frac{\binom{n-d}{\lfloor\frac{n}{d}\rfloor-1}}{\binom{n}{\lfloor\frac{n}{d}\rfloor}}>\frac{2+\Delta}{t},

then there exists some elementwise non-negative vector p\vec{p} satisfying (4). Thus, there exists a valid probability distribution on 𝐬\mathbf{s}, using which we can generate a (t,n,d)(t,n,d)-tropical code with high probability. ∎

IV-C Performance of the probabilistic construction

In this section, we establish conditions for which the proposed construction provides us with a valid tropical code.

Theorem 3.

The construction given in Section IV-B is a (t,n,d,ϵ)\left(t,n,d,\epsilon\right)-tropical code if

t>2e(d2log(ned)+dlog(dϵ)+2d).t>2e\left(d^{2}\log\left(\frac{ne}{d}\right)+d\log\left(\frac{d}{\epsilon}\right)+2d\right).

We will use Chernoff bound to establish Theorem 3.

Lemma 3 (Chernoff bound [7]).

Let X=k=1tXkX=\sum_{k=1}^{t}X_{k} where all XkBern(p)X_{k}\sim\operatorname{Bern}(p) are independent. Let μ=𝔼(X)=tp\mu=\mathbb{E}(X)=tp. Then

P(X(1δ)μ)eμδ2/2,\operatorname{\textsf{P}}(X\leq(1-\delta)\mu)\leq e^{-\mu\delta^{2}/2},

for all 0<δ<1.0<\delta<1.

Proof:

For a given ϵ(0,1)\epsilon\in(0,1), choose Δ=(2+2log(dϵ(nd)))\Delta=\left(2+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right). For some dd fixed distinct blocks indexed i1,,id[n]i_{1},\ldots,i_{d}\in[n] let Xk=𝟏{𝑺k,i1=1,𝒔k,i2=0,𝒔k,id=0}Bern(4+2log(dϵ(nd))t)X_{k}=\mathbf{1}_{\{{\bm{S}}_{k,i_{1}}=1,{\bm{s}}_{k,i_{2}}=0,\ldots{\bm{s}}_{k,i_{d}}=0\}}\sim\operatorname{Bern}(\frac{4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)}{t}) be the random variable indicating whether or not the i1thi_{1}^{\text{th}} entry of the kthk^{\text{th}} row is 11 and the i2,,idi_{2},\ldots,i_{d} entries of the kthk^{\text{th}} row are 0. Then the probability of error for the chosen subset of blocks is P(k[t]Xk<2)\operatorname{\textsf{P}}\left(\sum_{k\in[t]}X_{k}<2\right), since the sum k[t]Xk\sum_{k\in[t]}X_{k} gives the size of the set |Bi1(j=2dBij)|\left|B_{i_{1}}\setminus\left(\bigcup_{j=2}^{d}B_{i_{j}}\right)\right|. Then note that

P(k[t]Xk<2)\displaystyle\operatorname{\textsf{P}}\Big{(}\sum_{k\in[t]}X_{k}<2\Big{)}
=P(k[t]Xk<(4+2log(dϵ(nd)))(12+2log(dϵ(nd))4+2log(dϵ(nd)))).\displaystyle=\operatorname{\textsf{P}}\left(\sum_{k\in[t]}X_{k}<\Big{(}4+2\log\Big{(}\tfrac{d}{\epsilon}\tbinom{n}{d}\Big{)}\Big{)}\left(1-\tfrac{2+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)}{4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)}\right)\right).

We may then use the Chernoff bound (lemma 3) with μ=4+2log(dϵ(nd))\mu=4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right) and δ=2+2log(dϵ(nd))4+2log(dϵ(nd))\delta=\frac{2+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)}{4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)} to get:

P(k[t]Xk<2)\displaystyle\operatorname{\textsf{P}}\Big{(}\sum_{k\in[t]}X_{k}<2\Big{)}
exp((4+2log(dϵ(nd)))(2+2log(dϵ(nd)))22(4+2log(dϵ(nd)))2)\displaystyle\leq\exp\left(-\left(4+2\log\left(\tfrac{d}{\epsilon}\tbinom{n}{d}\right)\right)\tfrac{\left(2+2\log\left(\tfrac{d}{\epsilon}\binom{n}{d}\right)\right)^{2}}{2\left(4+2\log\left(\tfrac{d}{\epsilon}\binom{n}{d}\right)\right)^{2}}\right)
=exp((2+2log(dϵ(nd)))22(4+2log(dϵ(nd)))).\displaystyle=\exp\left(-\frac{\left(2+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right)^{2}}{2\left(4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right)}\right).

Now we take the union bound over every subset of [n][n] with cardinality dd with one specific index chosen. This gives an upper bound for Pen,dP_{e}^{n,d}.

Pen,d\displaystyle P_{e}^{n,d} I[n],|I|=di1IP(k[t]Xk<2)\displaystyle\leq\sum_{\begin{subarray}{c}I\subset[n],|I|=d\\ i_{1}\in I\end{subarray}}\operatorname{\textsf{P}}\left(\sum_{k\in[t]}X_{k}<2\right)
d(nd)exp((2+2log(dϵ(nd)))22(4+2log(dϵ(nd))))\displaystyle\leq d\binom{n}{d}\exp\left(-\frac{\left(2+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right)^{2}}{2\left(4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right)}\right)
d(nd)exp(1log(dϵ(nd)))\displaystyle\leq d\tbinom{n}{d}\exp\left(-1-\log\left(\tfrac{d}{\epsilon}\tbinom{n}{d}\right)\right)
d(nd)exp(log(ϵd(nd)))\displaystyle\leq d\tbinom{n}{d}\exp\Big{(}\log\Big{(}\tfrac{\epsilon}{d\tbinom{n}{d}}\Big{)}\Big{)}
=ϵ\displaystyle=\epsilon

To study the asymptotic behaviour of the construction in Lemma 1, we reduce our final expression using the following asymptotic approximation of the binomial coefficient

(nαn)12πα(1α)n2nh(α),\binom{n}{\alpha n}\approx\frac{1}{\sqrt{2\pi\alpha(1-\alpha)n}}2^{nh(\alpha)},

where h()h(\cdot) is the binary entropy function. This gives

(nnd)(ndnd1)ndn2dh(1/d)<ndned,\displaystyle\frac{\binom{n}{\lfloor\frac{n}{d}\rfloor}}{\binom{n-d}{\lfloor\frac{n}{d}\rfloor-1}}\approx\sqrt{\frac{n-d}{n}}2^{dh(1/d)}<\sqrt{\frac{n-d}{n}}ed,

so that our construction is a (t,n,d,ϵ)(t,n,d,\epsilon)-tropical code if

t>ndned(4+2log(dϵ(nd))).t>\sqrt{\frac{n-d}{n}}ed\left(4+2\log\left(\frac{d}{\epsilon}\binom{n}{d}\right)\right). (5)

Using the bounds ndn<1\sqrt{\frac{n-d}{n}}<1 and (nd)(end)d\binom{n}{d}\leq\left(\frac{en}{d}\right)^{d} we can relax the condition in (5) so that our construction is a (t,n,d,ϵ)(t,n,d,\epsilon)-tropical code if

t>2e(d2log(ned)+dlog(dϵ)+2d).t>2e\left(d^{2}\log\left(\frac{ne}{d}\right)+d\log\left(\frac{d}{\epsilon}\right)+2d\right). (6)

V Comparison

Since both probabilistic and deterministic constructions exist for disjunct matrices, and our result in Theorem 1 provides a construction for a (d1)(d-1)-doubly disjunct block design from a (d1)(d-1)-disjunct block design, we have three ways to produce (t,n,d,ϵ)(t,n,d,\epsilon)-tropical codes for some fixed dd:

  1. 1.

    First construct a deterministic (d1)(d-1)-disjunct matrix, then use Theorem 2 to construct a (d1)(d-1)-doubly disjunct matrix;

  2. 2.

    Construct a (d1)(d-1)-disjunct matrix probabilistically, then use Theorem 2 to construct a (d1)(d-1)-doubly disjunct matrix;

  3. 3.

    Construct a (d1)(d-1)-doubly disjunct random matrix directly using Lemma 1.

Theorem 2 doubles the number of tests used, but does not change the asymptotic behaviour of the construction used for a (d1)(d-1)-disjunct matrix. The deterministic constructions given by [8, 3] have t=O(d2logn)t=O(d^{2}\log n) with ϵ=0\epsilon=0 for specific parameter choices. As such, the first method also gives t=O(d2logn)t=O(d^{2}\log n) with ϵ=0\epsilon=0. For the second method, we can use our construction detailed in Section IV-B to construct (d1)(d-1)-disjunct matrices by replacing the right hand side of Lemma 1 and all subsequent calculations with 1+Δt\frac{1+\Delta}{t}. This gives (d1)(d-1)-disjunct matrices with t>e(2d2log(ned)+2dlog(dϵ)+3d)t>e\left(2d^{2}\log\left(\frac{ne}{d}\right)+2d\log\left(\frac{d}{\epsilon}\right)+3d\right) for error bound ϵ\epsilon. Then, using Theorem 2, we can get a (d1)(d-1)-doubly disjunct matrix with t>2e(2d2log(ned)+2dlog(dϵ)+3d)t>2e\left(2d^{2}\log\left(\frac{ne}{d}\right)+2d\log\left(\frac{d}{\epsilon}\right)+3d\right), which is greater than the bound in Theorem 3 for the construction of a (d1)(d-1)-doubly disjunct matrix directly, as in the third method. From equation (5) in Theorem 3, the construction given in Section IV-B requires t=O(dlog(nd)+dlog(dϵ))t=O\left(d\log\binom{n}{d}+d\log\left(\frac{d}{\epsilon}\right)\right). When d=o(n)d=o(n) is sublinear, then we have t=O(d2logn+dlog(d/ϵ))t=O(d^{2}\log n+d\log(d/\epsilon)). If d=Θ(n)d=\Theta(n), then t=o(d2logn+dlog(d/ϵ))t=o(d^{2}\log n+d\log(d/\epsilon)).

In a practical setting, if the situation requires zero error, then we can use the first method to construct a doubly disjunct matrix with the set of given parameters. Otherwise, if some error ϵ\epsilon is tolerable, then our construction in Section IV-B may be preferable.

VI Conclusion

In this paper we consider a probabilistic construction for a (d1)(d-1)-doubly disjunct block design, as well as a deterministic method based on the existence of a (d1)(d-1)-disjunct block design. We show that constructing a (d1)(d-1)-doubly-disjunct block design probabilistically gives a similar performance to a fully deterministic construction based on a (d1)(d-1)-disjunct block design, with the added benefit of being able to use arbitrary parameters.

References

  • [1] H.-P. Wang, R. Gabrys, and A. Vardy, “Tropical group testing,” 2022. [Online]. Available: https://arxiv.org/abs/2201.05440
  • [2] R. Graham and N. Sloane, “Lower bounds for constant weight codes,” IEEE Transactions on Information Theory, vol. 26, no. 1, pp. 37–43, 1980.
  • [3] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 363–377, 1964.
  • [4] M. Cheraghchi and V. Nakos, “Combinatorial group testing and sparse recovery schemes with near-optimal decoding time,” CoRR, vol. abs/2006.08420, 2020. [Online]. Available: https://arxiv.org/abs/2006.08420
  • [5] H. A. Inan, P. Kairouz, M. Wootters, and A. Özgür, “On the optimality of the kautz-singleton construction in probabilistic group testing,” IEEE Trans. Inf. Theor., vol. 65, no. 9, p. 5592–5603, sep 2019. [Online]. Available: https://doi.org/10.1109/TIT.2019.2902397
  • [6] S. Boyd and L. Vandenberghe, Convex Optimization.   Cambridge University Press, 2004.
  • [7] M. Mitzenmacher and E. Upfal, Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis.   Cambridge university press, 2017.
  • [8] E. Porat and A. Rothschild, “Explicit non-adaptive combinatorial group testing schemes,” CoRR, vol. abs/0712.3876, 2007. [Online]. Available: http://arxiv.org/abs/0712.3876