This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Limiting distributions of ratios of Binomial random variables

Adriel Barretto111University of Virginia, Department of Statistics, Charlottesville, VA, USA.aand Zachary Lubberts11footnotemark: 1
Abstract

We consider the limiting distribution of the quantity Xs/(X+Y)rX^{s}/(X+Y)^{r}, where XX and YY are two independent Binomial random variables with a common success probability and a number of trials nn and mm, respectively, and r,sr,s are positive real numbers. Under several settings, we prove that this converges to a Normal distribution with a given mean and variance, and demonstrate these theoretical results through simulations.

1 Introduction

As part of the analysis in lubberts2025 , the authors consider the quantity

XX+Y,\frac{X}{\sqrt{X+Y}},

where XBinomial(n,p),X\sim\mathrm{Binomial}(n,p), YBinomial(m,p)Y\sim\mathrm{Binomial}(m,p), and X,YX,Y are independent of one another. The setting of interest in that paper occurs when mm has order comparable to n2n^{2}, so that the quantity in the denominator is typically of a similar order to the one in the numerator, and in this case, it was shown that for large values of nn, this ratio has a limiting Normal distribution with mean p\sqrt{p} and variance (1p)/n(1-p)/n. While it is well-known that given the value of X+YX+Y, for some random variables XX and YY as above, the value of XX is a Hypergeometric random variable, the literature is less forthcoming about the distribution of such ratios of a pair of Binomial random variables. In the present work, we extend the previous results to the case where the function takes the form

R=Xs(X+Y)r,R=\frac{X^{s}}{(X+Y)^{r}},

for r,s>0r,s>0. We will show that under several parameter regimes, RR has a limiting Normal distribution. We verify these results through simulations, showing the effects of varying each of the parameters on the observed distribution.

2 Limiting Distribution

We may approximate the function defining our ratio of interest using the Taylor approximation:

f(x,y)=xs(x+y)r=(np)s(np+mp)r+f(np,mp)T[xnpymp]+Q(x,y),f(x,y)=\frac{x^{s}}{(x+y)^{r}}=\frac{(np)^{s}}{(np+mp)^{r}}+\nabla f(np,mp)^{T}\begin{bmatrix}x-np\\ y-mp\end{bmatrix}+Q(x,y), (1)

where the quadratic remainder term takes the form

Q(x,y)=12[xnpymp]2f(ξ,η)[xnpymp],Q(x,y)=\frac{1}{2}\begin{bmatrix}x-np&y-mp\end{bmatrix}\nabla^{2}f(\xi,\eta)\begin{bmatrix}x-np\\ y-mp\end{bmatrix},

for some point (ξ,η)(\xi,\eta) on the line segment connecting (np,mp)(np,mp) with (x,y)(x,y).

If we substitute the random variables XX and YY into this expression, the linear term in Equation (1) already looks quite Normal in distribution once nn and mm are large, since X,YX,Y are independent Binomial random variables. However, in order to show that the limiting distribution of RR looks like the linear terms in this equation, we must bound the remainder term. We will make use of the following result from chung2000course :

Lemma 1.

Let XnXX_{n}\rightarrow X in distribution, and Yn0Y_{n}\rightarrow 0 in probability. Then Xn+YnXX_{n}+Y_{n}\rightarrow X in distribution.

We will show that after appropriate scaling, the quadratic remainder term converges to 0 in probability, so the two linear terms determine the limiting distribution of RR. For any quadratic form xTAxx^{T}Ax, applying the Cauchy-Schwarz inequality and monotonicity property of the spectral norm, we have

xTAx=Ax,xAxxAx2,x^{T}Ax=\langle Ax,x\rangle\leq\|Ax\|\|x\|\leq\|A\|\|x\|^{2},

so in order to control Q,Q, we must find an upper bound for 2f(x,y)2\|\nabla^{2}f(x,y)\|_{2}. Note that a Hessian matrix must be symmetric, so its singular values are simply the absolute values of its eigenvalues. The Gerschgorin disk theorem horn2012matrix tells us that any eigenvalue of a matrix must belong to the union of its Gerschgorin disks, so no eigenvalue of a matrix AA can be larger than

maxij=1n|Aij|i,j=1n|Aij|.\max_{i}\sum_{j=1}^{n}|A_{ij}|\leq\sum_{i,j=1}^{n}|A_{ij}|.

In the present case, the Hessian is given by

2f(x,y)=xs2(x+y)r+2[s(s1)(x+y)22rsx(x+y)+r(r+1)x2r(r+1)x2rsx(x+y)r(r+1)x2rsx(x+y)r(r+1)x2].\nabla^{2}f(x,y)=\frac{x^{s-2}}{(x+y)^{r+2}}\begin{bmatrix}s(s-1)(x+y)^{2}-2rsx(x+y)+r(r+1)x^{2}&r(r+1)x^{2}-rsx(x+y)\\ r(r+1)x^{2}-rsx(x+y)&r(r+1)x^{2}\end{bmatrix}.

Utilizing the Gerschgorin bound just stated, we have

2f(x,y)2\displaystyle\|\nabla^{2}f(x,y)\|_{2} xs2(x+y)r+2[|s(s1)(x+y)22rsx(x+y)+r(r+1)x2|\displaystyle\leq\frac{x^{s-2}}{(x+y)^{r+2}}\left[|s(s-1)(x+y)^{2}-2rsx(x+y)+r(r+1)x^{2}|\right.
+2|r(r+1)x2rsx(x+y)|+r(r+1)x2].\displaystyle\quad\left.+2|r(r+1)x^{2}-rsx(x+y)|+r(r+1)x^{2}\right]. (2)

Now since X,YX,Y are Binomial random variables, we know that with overwhelming probability, |Xnp|nlog(n)p(1p)|X-np|\leq\sqrt{n\log(n)p(1-p)}, and |Ymp|mlog(m)p(1p)|Y-mp|\leq\sqrt{m\log(m)p(1-p)}; let us call this event 𝒜\mathcal{A}. More precisely, the probability that |Xnp|>Cnlog(n)|X-np|>C\sqrt{n\log(n)} decays faster than Cn2Cn^{-2}, and similarly for YY. So to bound the residual term, we see that the event

{|Q(X,Y)|>ϵ}{|Q(X,Y)|>ϵ}{|Xnp|>Cnlogn}{|Ymp|>Cmlogm},\{|Q(X,Y)|>\epsilon\}\subseteq\{|Q(X,Y)|>\epsilon\}\cup\{|X-np|>C\sqrt{n\log n}\}\cup\{|Y-mp|>C\sqrt{m\log m}\},

and thus

[|Q(X,Y)|>ϵ]\displaystyle\mathbb{P}[|Q(X,Y)|>\epsilon] [(|Q(X,Y)|>ϵ)(|Xnp|Cnlogn)(|Ymp|Cmlogm)]\displaystyle\leq\mathbb{P}[(|Q(X,Y)|>\epsilon)\cap(|X-np|\leq C\sqrt{n\log n})\cap(|Y-mp|\leq C\sqrt{m\log m})]
+Cn2+Cm2.\displaystyle\quad+Cn^{-2}+Cm^{-2}.

Consider x=np+rxx=np+r_{x}, where |rx|Cnlog(n)|r_{x}|\leq C\sqrt{n\log(n)}, and y=mp+ryy=mp+r_{y}, where |ry|Cmlog(m)|r_{y}|\leq C\sqrt{m\log(m)}. Then the right hand side of inequality (2) may be bounded by

Cns2max{s(n+m),rn}2(n+m)r+2.\frac{Cn^{s-2}\max\{s(n+m),rn\}^{2}}{(n+m)^{r+2}}.

We also have the inequality

[xnpymp]2\displaystyle\left\|\begin{bmatrix}x-np\\ y-mp\end{bmatrix}\right\|^{2} =rx2+ry2Cnlog(n)+Cmlog(m),\displaystyle=r_{x}^{2}+r_{y}^{2}\leq Cn\log(n)+Cm\log(m),

so on the event 𝒜\mathcal{A}, we get that

|Q(X,Y)|\displaystyle|Q(X,Y)| Cns2max{s(n+m),rn}2(n+m)r+2(nlog(n)+mlog(m))\displaystyle\leq\frac{Cn^{s-2}\max\{s(n+m),rn\}^{2}}{(n+m)^{r+2}}(n\log(n)+m\log(m))
Cns2log(n+m)(n+m)r1,\displaystyle\leq\frac{Cn^{s-2}\log(n+m)}{(n+m)^{r-1}}, (3)

since s,rs,r are constants. The exact behavior of this quantity depends on the ratio m/nm/n, but in light of Lemma 1, in order to obtain the distributional convergence results, we must simply show that the final quantity in (3) still goes to zero after appropriate scaling, for any of the cases we wish to consider.

To find the limiting distribution of R=f(X,Y)R=f(X,Y), we re-arrange Equation (1) after evaluating at (X,Y)(X,Y) (and neglecting the remainder term for the time being):

R(np)s((n+m)p)r\displaystyle R-\frac{(np)^{s}}{((n+m)p)^{r}} f(np,mp)T[XnpYmp]\displaystyle\approx\nabla f(np,mp)^{T}\begin{bmatrix}X-np\\ Y-mp\end{bmatrix}
=(np)s((n+m)p)r+1[(s(n+m)nr)np(1p)rmp(1p)][Xnpnp(1p)Ympmp(1p)].\displaystyle=\frac{(np)^{s}}{((n+m)p)^{r+1}}\begin{bmatrix}\left(\frac{s(n+m)}{n}-r\right)\sqrt{np(1-p)}&-r\sqrt{mp(1-p)}\end{bmatrix}\begin{bmatrix}\frac{X-np}{\sqrt{np(1-p)}}\\ \frac{Y-mp}{\sqrt{mp(1-p)}}\end{bmatrix}.

The last vector will converge to a Normal vector (ZX,ZY)(Z_{X},Z_{Y}) as n,mn,m\rightarrow\infty, but an appropriate scaling is required so that neither of the coefficients of ZX,ZYZ_{X},Z_{Y} diverge as n,mn,m\rightarrow\infty, and it is not the case that both of them vanish (since in that case, the limiting distribution is simply point mass at 0). The choice of scaling will again depend on the ratio m/nm/n, but now we have all of the ingredients in place to prove the following theorem:

Theorem 1.

Let r,s>0r,s>0, p(0,1)p\in(0,1) be constants. Let XBinomial(n,p)X\sim\mathrm{Binomial}(n,p), and let YBinomial(m,p)Y\sim\mathrm{Binomial}(m,p) be independent of XX. Define

R=Xr(X+Y)s.R=\frac{X^{r}}{(X+Y)^{s}}.

Then as n,mn,m\rightarrow\infty, we have the following convergences in distribution:

  1. (i)

    If m/nm/n\rightarrow\infty, mlog(m)n3/20m\log(m)n^{-3/2}\rightarrow 0, then

    mrns1/2(Rns(n+m)rpsr)𝒩(0,p2(sr)1(1p)s2).\frac{m^{r}}{n^{s-1/2}}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)s^{2}\right).
  2. (ii)

    If m/nα(0,+)m/n\rightarrow\alpha\in(0,+\infty), then

    nrs+1/2(Rns(n+m)rpsr)𝒩(0,p2(sr)1(1p)[(s(1+α)r)2+αr2](1+α)2(r+1)).n^{r-s+1/2}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)\frac{[(s(1+\alpha)-r)^{2}+\alpha r^{2}]}{(1+\alpha)^{2(r+1)}}\right).
  3. (iii)

    If m/n0m/n\rightarrow 0, then

    nrs+1/2(Rns(n+m)rpsr)𝒩(0,p2(sr)1(1p)(sr)2).n^{r-s+1/2}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)(s-r)^{2}\right).
Proof.

It remains to be shown that in the three cases described above, the bound in (3) still goes to zero after multiplication by the appropriate scaling factor as n,mn,m\rightarrow\infty.

  1. (i)

    When m/nm/n\rightarrow\infty, mlog(m)n3/20m\log(m)n^{-3/2}\rightarrow 0, then

    Cns2log(n+m)(n+m)r1mrns1/2mlog(m)n3/20.C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}\frac{m^{r}}{n^{s-1/2}}\sim\frac{m\log(m)}{n^{3/2}}\rightarrow 0.
  2. (ii)

    When m/nαm/n\rightarrow\alpha, then

    Cns2log(n+m)(n+m)r1nrs+1/2log(n)n0.C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}n^{r-s+1/2}\sim\frac{\log(n)}{\sqrt{n}}\rightarrow 0.
  3. (iii)

    When m/n0m/n\rightarrow 0, then

    Cns2log(n+m)(n+m)r1nrs+1/2log(n)n0.C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}n^{r-s+1/2}\sim\frac{\log(n)}{\sqrt{n}}\rightarrow 0.

3 Simulation

We verify our limiting distribution results for different values of the parameters n,m,p,rn,m,p,r, and ss. For each set of parameter values we tested, we generate 100,000 points (Xi,Yi)(X_{i},Y_{i}), where XiBinomial(n,p)X_{i}\sim\mathrm{Binomial}(n,p), YiBinomial(m,p)Y_{i}\sim\mathrm{Binomial}(m,p), and Xi,YiX_{i},Y_{i} are independent. We then compute Xis(Xi+Yi)r\frac{X_{i}^{s}}{(X_{i}+Y_{i})^{r}}, and center and scale according to Theorem 1, obtaining RiR_{i}. We compare this sample with a sample of 100,000 values ZjZ_{j} drawn from the Normal distribution with mean 0 and the appropriate variance. To measure the accuracy between the distributions of RR and ZZ, we use the discrete KL divergence: We divide the values in {Ri},{Zj}\{R_{i}\},\{Z_{j}\} into 100 bins x𝕏x\in\mathbb{X}, then set A(x)A(x) to be the observed proportion of RixR_{i}\in x and B(x)B(x) to be the observed proportion of ZjxZ_{j}\in x. We compare the two distributions with the formula

DKL(A||B)=x𝕏A(x)[log(A(x))log(B(x))].D_{KL}(A||B)=\sum_{x\in\mathbb{X}}A(x)[\log(A(x))-\log(B(x))]. (4)

When mn\frac{m}{n}\rightarrow\infty, it can happen that all of the observations fall into a single bin, so in this case we use the reversed formula to get a meaningful comparison of the distributions:

DKL(B||A)=x𝕏B(x)[log(B(x))log(A(x))].D_{KL}(B||A)=\sum_{x\in\mathbb{X}}B(x)[\log(B(x))-\log(A(x))]. (5)

We test the distributions in each of the settings of Theorem 1, as well as the case where m/nm/n\rightarrow\infty, but mlog(m)/n3/2↛0m\log(m)/n^{3/2}\not\rightarrow 0. Whenever we fix the value of rr or ss, it is set equal to 15 (except in Section 3.4, for reasons that we explain there), and whenever we fix the value of pp, it is set equal to 0.5. For each of the regimes of m/nm/n, all plots show the effect of varying one of the parameters while the others remain fixed.

3.1 mn,mlog(m)n3/2↛0\frac{m}{n}\rightarrow\infty,m\log(m)n^{-3/2}\not\rightarrow 0

While outside the scope of the theorem, for this setting, whenever we fix mm or nn, we choose m=2×109m=2\times 10^{9}, n=2×105n=2\times 10^{5}. Whenever we fix one of r,sr,s, we set its value to be 1515, and whenever we fix pp, we set its value to be 0.5. The results are shown in Figure 1. We observe that the KL divergence remains concentrated around 3.35 very consistently despite changing values of the parameters. This occurs since the limiting distribution ends up collapsing to point mass at 0, as a result of the denominator growing much faster than the numerator in

Xs(X+Y)r.\frac{X^{s}}{(X+Y)^{r}}.
Refer to caption
(a) Changing pp (range 0 to 1)
Refer to caption
(b) Changing ss (range 1 to 30)
Refer to caption
(c) Changing rr (range 1 to 30)
Refer to caption
(d) Changing mm (range 2×1092\times 10^{9} to 2.001×1092.001\times 10^{9})
Refer to caption
(e) Changing nn (range 2×1052\times 10^{5} to 1.2×1061.2\times 10^{6})
Figure 1: Effect of changing parameters on the KL divergence comparing samples coming from our true distribution with those of the Normal distribution, in the setting where m/nm/n\rightarrow\infty and mlog(m)n3/2↛0m\log(m)n^{-3/2}\not\rightarrow 0. See Section 3.1 for more details.

3.2 m/n,mlog(m)n3/20m/n\rightarrow\infty,m\log(m)n^{-3/2}\rightarrow 0

In this setting, whenever we fix mm, we choose the value 1.1×1091.1\times 10^{9}, and n=3.8×106n=3.8\times 10^{6}. As usual, when we fix the other parameters, we choose r,s=15r,s=15 and p=0.5p=0.5. The results may be seen in Figure 2. In this setting, the KL divergence is nearly zero regardless of changes in the parameters. The low value of the KL divergence indicates that the simulated and hypothetical distributions are nearly identical, reinforcing Theorem 1. We note that when rr grows, the distribution of XrX^{r} will become increasingly skewed, so nn and mm may need to be larger to get the same degree of convergence in distribution. While we can see in panel (c) that the KL divergence does increase with rr, the value is still very small (0.012) even for r=30r=30 for these values of mm and nn, however.

Refer to caption
(a) Changing pp (range 0 to 1)
Refer to caption
(b) Changing ss (range 1 to 30)
Refer to caption
(c) Changing rr (range 1 to 30)
Refer to caption
(d) Changing mm (range 1.1×1091.1\times 10^{9} to 1.101×1091.101\times 10^{9})
Refer to caption
(e) Changing nn (range 3.8×1063.8\times 10^{6} to 4.8×1064.8\times 10^{6})
Figure 2: Effect of changing parameters on the KL divergence comparing samples coming from our true distribution with those of the Normal distribution, in the setting that m/nm/n\rightarrow\infty, mlog(m)n3/20m\log(m)n^{-3/2}\rightarrow 0. See Section 3.2 for details.

3.3 mnα\frac{m}{n}\rightarrow\alpha

In this setting, whenever we fix mm, we choose the value 10610^{6}, and n=106n=10^{6}. As usual, when we fix the other parameters, we choose r,s=15r,s=15 and p=0.5p=0.5. The results may be seen in Figure 3. Again in this setting, the KL divergence remains nearly 0 regardless of the values of the parameters, reinforcing Theorem 1.

Refer to caption
(a) Changing pp (range 0 to 1)
Refer to caption
(b) Changing ss (range 1 to 30)
Refer to caption
(c) Changing rr (range 1 to 30)
Refer to caption
(d) Changing mm (range 10610^{6} to 2×1062\times 10^{6})
Refer to caption
(e) Changing nn (range 10610^{6} to 2×1062\times 10^{6})
Figure 3: Effect of changing parameters on the KL divergence comparing samples coming from our true distribution with those of the Normal distribution, in the setting where m/nα(0,+)m/n\rightarrow\alpha\in(0,+\infty). See Section 3.3 for more details.

3.4 mn0\frac{m}{n}\rightarrow 0

In this setting, whenever we fix mm, we choose the value 3.8×1063.8\times 10^{6}, and n=1.1×109n=1.1\times 10^{9}. Unlike in previous cases, when the remaining parameters are fixed, we choose r=15r=15 and s=16s=16, while p=0.5p=0.5. The results may be seen in Figure 4. The reason for the change in the default value for ss can be explained by considering panels (b) and (c) where rr and ss vary. We can see that the KL divergence spikes when r=sr=s: this comes from the fact that in this case, the hypothesized limiting distribution has 0 variance, so the Normal distribution we are comparing to collapses. Since for any finite mm and nn, the distribution of RR is not 0, the KL divergence is much larger at this point. Excluding this case, the KL divergences in these plots are all small, indicating close similarity between the simulated distributions and the proposed hypothetical distributions. We also note in panel (d) that increasing mm results in a worse KL divergence, but this also means that the ratio m/nm/n is increasing, meaning that we are further from the limiting regime we are considering in this case. The maximum value of the KL divergence is still quite small for this range of mm, however.

Refer to caption
(a) Changing pp (range 0 to 1)
Refer to caption
(b) Changing ss (range 1 to 30)
Refer to caption
(c) Changing rr (range 1 to 30)
Refer to caption
(d) Changing mm (range 2.8×1062.8\times 10^{6} to 1.2×1061.2\times 10^{6})
Refer to caption
(e) Changing nn (range 1.1×1091.1\times 10^{9} to 1.101×1091.101\times 10^{9})
Figure 4: Effect of changing parameters on the KL divergence comparing samples coming from our true distribution with those of the Normal distribution, in the setting where m/n0m/n\rightarrow 0. Note that unlike in other sections, when we hold the parameter ss fixed, we choose the value s=16s=16, since the hypothesized limiting Normal distribution has variance 0 when r=sr=s in this setting. See Section 3.4 for more details.

4 Conclusion

We studied the limiting distribution of a function of two independent Binomial random variables, in the setting where the number of trials for both variables grows large, but the rates of growth for those two quantities may differ. We were able to show that under several parameter regimes, the limiting distribution after centering and scaling is Normal, with a given variance. However, this does not exhaust the range of possible values for which one could consider this function: For example, in lubberts2025 , the authors considered the case where mn2m\sim n^{2} (for the special case of r=1,s=1/2r=1,s=1/2), which is not addressed by our results here. While our results determine the appropriate values for the mean and variance of RR in the case when mm and nn are large, determining the behavior of higher moments would require a more careful analysis of the quadratic remainder term than the one undertaken in the present work. As a final remark, we note that even with random variables as well-studied as the Binomial, there still remain many interesting questions to explore.

References

  • [1] Kai Lai Chung. A course in probability theory. Elsevier, 2000.
  • [2] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012.
  • [3] Zachary Lubberts, Avanti Athreya, Youngser Park, and Carey E Priebe. Random line graphs and edge-attributed network inference. Bernoulli, 2025. arXiv:2103.14726.