Limiting distributions of ratios of Binomial random variables

Adriel Barretto¹¹1University of Virginia, Department of Statistics, Charlottesville, VA, USA.aand Zachary Lubberts¹¹footnotemark: 1

Abstract

We consider the limiting distribution of the quantity $X^{s}/(X+Y)^{r}$ , where $X$ and $Y$ are two independent Binomial random variables with a common success probability and a number of trials $n$ and $m$ , respectively, and $r,s$ are positive real numbers. Under several settings, we prove that this converges to a Normal distribution with a given mean and variance, and demonstrate these theoretical results through simulations.

1 Introduction

As part of the analysis in lubberts2025 , the authors consider the quantity

\frac{X}{\sqrt{X+Y}},

where $X\sim\mathrm{Binomial}(n,p),$ $Y\sim\mathrm{Binomial}(m,p)$ , and $X,Y$ are independent of one another. The setting of interest in that paper occurs when $m$ has order comparable to $n^{2}$ , so that the quantity in the denominator is typically of a similar order to the one in the numerator, and in this case, it was shown that for large values of $n$ , this ratio has a limiting Normal distribution with mean $\sqrt{p}$ and variance $(1-p)/n$ . While it is well-known that given the value of $X+Y$ , for some random variables $X$ and $Y$ as above, the value of $X$ is a Hypergeometric random variable, the literature is less forthcoming about the distribution of such ratios of a pair of Binomial random variables. In the present work, we extend the previous results to the case where the function takes the form

R=\frac{X^{s}}{(X+Y)^{r}},

for $r,s>0$ . We will show that under several parameter regimes, $R$ has a limiting Normal distribution. We verify these results through simulations, showing the effects of varying each of the parameters on the observed distribution.

2 Limiting Distribution

We may approximate the function defining our ratio of interest using the Taylor approximation:

f(x,y)=\frac{x^{s}}{(x+y)^{r}}=\frac{(np)^{s}}{(np+mp)^{r}}+\nabla f(np,mp)^{T}\begin{bmatrix}x-np\\ y-mp\end{bmatrix}+Q(x,y),

(1)

where the quadratic remainder term takes the form

Q(x,y)=\frac{1}{2}\begin{bmatrix}x-np&y-mp\end{bmatrix}\nabla^{2}f(\xi,\eta)\begin{bmatrix}x-np\\ y-mp\end{bmatrix},

for some point $(\xi,\eta)$ on the line segment connecting $(np,mp)$ with $(x,y)$ .

If we substitute the random variables $X$ and $Y$ into this expression, the linear term in Equation (1) already looks quite Normal in distribution once $n$ and $m$ are large, since $X,Y$ are independent Binomial random variables. However, in order to show that the limiting distribution of $R$ looks like the linear terms in this equation, we must bound the remainder term. We will make use of the following result from chung2000course :

Lemma 1.

Let $X_{n}\rightarrow X$ in distribution, and $Y_{n}\rightarrow 0$ in probability. Then $X_{n}+Y_{n}\rightarrow X$ in distribution.

We will show that after appropriate scaling, the quadratic remainder term converges to 0 in probability, so the two linear terms determine the limiting distribution of $R$ . For any quadratic form $x^{T}Ax$ , applying the Cauchy-Schwarz inequality and monotonicity property of the spectral norm, we have

x^{T}Ax=\langle Ax,x\rangle\leq\|Ax\|\|x\|\leq\|A\|\|x\|^{2},

so in order to control $Q,$ we must find an upper bound for $\|\nabla^{2}f(x,y)\|_{2}$ . Note that a Hessian matrix must be symmetric, so its singular values are simply the absolute values of its eigenvalues. The Gerschgorin disk theorem horn2012matrix tells us that any eigenvalue of a matrix must belong to the union of its Gerschgorin disks, so no eigenvalue of a matrix $A$ can be larger than

\max_{i}\sum_{j=1}^{n}|A_{ij}|\leq\sum_{i,j=1}^{n}|A_{ij}|.

In the present case, the Hessian is given by

\nabla^{2}f(x,y)=\frac{x^{s-2}}{(x+y)^{r+2}}\begin{bmatrix}s(s-1)(x+y)^{2}-2rsx(x+y)+r(r+1)x^{2}&r(r+1)x^{2}-rsx(x+y)\\ r(r+1)x^{2}-rsx(x+y)&r(r+1)x^{2}\end{bmatrix}.

Utilizing the Gerschgorin bound just stated, we have

	$\displaystyle\\|\nabla^{2}f(x,y)\\|_{2}$	$\displaystyle\leq\frac{x^{s-2}}{(x+y)^{r+2}}\left[\|s(s-1)(x+y)^{2}-2rsx(x+y)+r(r+1)x^{2}\|\right.$
		$\displaystyle\quad\left.+2\|r(r+1)x^{2}-rsx(x+y)\|+r(r+1)x^{2}\right].$		(2)

Now since $X,Y$ are Binomial random variables, we know that with overwhelming probability, $|X-np|\leq\sqrt{n\log(n)p(1-p)}$ , and $|Y-mp|\leq\sqrt{m\log(m)p(1-p)}$ ; let us call this event $\mathcal{A}$ . More precisely, the probability that $|X-np|>C\sqrt{n\log(n)}$ decays faster than $Cn^{-2}$ , and similarly for $Y$ . So to bound the residual term, we see that the event

\{|Q(X,Y)|>\epsilon\}\subseteq\{|Q(X,Y)|>\epsilon\}\cup\{|X-np|>C\sqrt{n\log n}\}\cup\{|Y-mp|>C\sqrt{m\log m}\},

and thus

	$\displaystyle\mathbb{P}[\|Q(X,Y)\|>\epsilon]$	$\displaystyle\leq\mathbb{P}[(\|Q(X,Y)\|>\epsilon)\cap(\|X-np\|\leq C\sqrt{n\log n})\cap(\|Y-mp\|\leq C\sqrt{m\log m})]$
		$\displaystyle\quad+Cn^{-2}+Cm^{-2}.$

Consider $x=np+r_{x}$ , where $|r_{x}|\leq C\sqrt{n\log(n)}$ , and $y=mp+r_{y}$ , where $|r_{y}|\leq C\sqrt{m\log(m)}$ . Then the right hand side of inequality (2) may be bounded by

\frac{Cn^{s-2}\max\{s(n+m),rn\}^{2}}{(n+m)^{r+2}}.

We also have the inequality

\displaystyle\left\|\begin{bmatrix}x-np\\ y-mp\end{bmatrix}\right\|^{2}

\displaystyle=r_{x}^{2}+r_{y}^{2}\leq Cn\log(n)+Cm\log(m),

so on the event $\mathcal{A}$ , we get that

	$\displaystyle\|Q(X,Y)\|$	$\displaystyle\leq\frac{Cn^{s-2}\max\{s(n+m),rn\}^{2}}{(n+m)^{r+2}}(n\log(n)+m\log(m))$
		$\displaystyle\leq\frac{Cn^{s-2}\log(n+m)}{(n+m)^{r-1}},$		(3)

since $s,r$ are constants. The exact behavior of this quantity depends on the ratio $m/n$ , but in light of Lemma 1, in order to obtain the distributional convergence results, we must simply show that the final quantity in (3) still goes to zero after appropriate scaling, for any of the cases we wish to consider.

To find the limiting distribution of $R=f(X,Y)$ , we re-arrange Equation (1) after evaluating at $(X,Y)$ (and neglecting the remainder term for the time being):

	$\displaystyle R-\frac{(np)^{s}}{((n+m)p)^{r}}$	$\displaystyle\approx\nabla f(np,mp)^{T}\begin{bmatrix}X-np\\ Y-mp\end{bmatrix}$
		$\displaystyle=\frac{(np)^{s}}{((n+m)p)^{r+1}}\begin{bmatrix}\left(\frac{s(n+m)}{n}-r\right)\sqrt{np(1-p)}&-r\sqrt{mp(1-p)}\end{bmatrix}\begin{bmatrix}\frac{X-np}{\sqrt{np(1-p)}}\\ \frac{Y-mp}{\sqrt{mp(1-p)}}\end{bmatrix}.$

The last vector will converge to a Normal vector $(Z_{X},Z_{Y})$ as $n,m\rightarrow\infty$ , but an appropriate scaling is required so that neither of the coefficients of $Z_{X},Z_{Y}$ diverge as $n,m\rightarrow\infty$ , and it is not the case that both of them vanish (since in that case, the limiting distribution is simply point mass at 0). The choice of scaling will again depend on the ratio $m/n$ , but now we have all of the ingredients in place to prove the following theorem:

Theorem 1.

Let $r,s>0$ , $p\in(0,1)$ be constants. Let $X\sim\mathrm{Binomial}(n,p)$ , and let $Y\sim\mathrm{Binomial}(m,p)$ be independent of $X$ . Define

R=\frac{X^{r}}{(X+Y)^{s}}.

Then as $n,m\rightarrow\infty$ , we have the following convergences in distribution:

(i)

If $m/n\rightarrow\infty$ , $m\log(m)n^{-3/2}\rightarrow 0$ , then

\frac{m^{r}}{n^{s-1/2}}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)s^{2}\right).

(ii)

If $m/n\rightarrow\alpha\in(0,+\infty)$ , then

n^{r-s+1/2}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)\frac{[(s(1+\alpha)-r)^{2}+\alpha r^{2}]}{(1+\alpha)^{2(r+1)}}\right).

(iii)

If $m/n\rightarrow 0$ , then

n^{r-s+1/2}\left(R-\frac{n^{s}}{(n+m)^{r}}p^{s-r}\right)\rightarrow\mathcal{N}\left(0,p^{2(s-r)-1}(1-p)(s-r)^{2}\right).

Proof.

It remains to be shown that in the three cases described above, the bound in (3) still goes to zero after multiplication by the appropriate scaling factor as $n,m\rightarrow\infty$ .

(i)

When $m/n\rightarrow\infty$ , $m\log(m)n^{-3/2}\rightarrow 0$ , then

$C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}\frac{m^{r}}{n^{s-1/2}}\sim\frac{m\log(m)}{n^{3/2}}\rightarrow 0.$
(ii)

When $m/n\rightarrow\alpha$ , then

$C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}n^{r-s+1/2}\sim\frac{\log(n)}{\sqrt{n}}\rightarrow 0.$
(iii)

When $m/n\rightarrow 0$ , then

$C\frac{n^{s-2}\log(n+m)}{(n+m)^{r-1}}n^{r-s+1/2}\sim\frac{\log(n)}{\sqrt{n}}\rightarrow 0.$

∎

3 Simulation

We verify our limiting distribution results for different values of the parameters $n,m,p,r$ , and $s$ . For each set of parameter values we tested, we generate 100,000 points $(X_{i},Y_{i})$ , where $X_{i}\sim\mathrm{Binomial}(n,p)$ , $Y_{i}\sim\mathrm{Binomial}(m,p)$ , and $X_{i},Y_{i}$ are independent. We then compute $\frac{X_{i}^{s}}{(X_{i}+Y_{i})^{r}}$ , and center and scale according to Theorem 1, obtaining $R_{i}$ . We compare this sample with a sample of 100,000 values $Z_{j}$ drawn from the Normal distribution with mean 0 and the appropriate variance. To measure the accuracy between the distributions of $R$ and $Z$ , we use the discrete KL divergence: We divide the values in $\{R_{i}\},\{Z_{j}\}$ into 100 bins $x\in\mathbb{X}$ , then set $A(x)$ to be the observed proportion of $R_{i}\in x$ and $B(x)$ to be the observed proportion of $Z_{j}\in x$ . We compare the two distributions with the formula

D_{KL}(A||B)=\sum_{x\in\mathbb{X}}A(x)[\log(A(x))-\log(B(x))].

(4)

When $\frac{m}{n}\rightarrow\infty$ , it can happen that all of the observations fall into a single bin, so in this case we use the reversed formula to get a meaningful comparison of the distributions:

D_{KL}(B||A)=\sum_{x\in\mathbb{X}}B(x)[\log(B(x))-\log(A(x))].

(5)

We test the distributions in each of the settings of Theorem 1, as well as the case where $m/n\rightarrow\infty$ , but $m\log(m)/n^{3/2}\not\rightarrow 0$ . Whenever we fix the value of $r$ or $s$ , it is set equal to 15 (except in Section 3.4, for reasons that we explain there), and whenever we fix the value of $p$ , it is set equal to 0.5. For each of the regimes of $m/n$ , all plots show the effect of varying one of the parameters while the others remain fixed.

3.1 $\frac{m}{n}\rightarrow\infty,m\log(m)n^{-3/2}\not\rightarrow 0$

While outside the scope of the theorem, for this setting, whenever we fix $m$ or $n$ , we choose $m=2\times 10^{9}$ , $n=2\times 10^{5}$ . Whenever we fix one of $r,s$ , we set its value to be $15$ , and whenever we fix $p$ , we set its value to be 0.5. The results are shown in Figure 1. We observe that the KL divergence remains concentrated around 3.35 very consistently despite changing values of the parameters. This occurs since the limiting distribution ends up collapsing to point mass at 0, as a result of the denominator growing much faster than the numerator in

\frac{X^{s}}{(X+Y)^{r}}.

Refer to caption — (a) Changing $p$ (range 0 to 1)

3.2 $m/n\rightarrow\infty,m\log(m)n^{-3/2}\rightarrow 0$

In this setting, whenever we fix $m$ , we choose the value $1.1\times 10^{9}$ , and $n=3.8\times 10^{6}$ . As usual, when we fix the other parameters, we choose $r,s=15$ and $p=0.5$ . The results may be seen in Figure 2. In this setting, the KL divergence is nearly zero regardless of changes in the parameters. The low value of the KL divergence indicates that the simulated and hypothetical distributions are nearly identical, reinforcing Theorem 1. We note that when $r$ grows, the distribution of $X^{r}$ will become increasingly skewed, so $n$ and $m$ may need to be larger to get the same degree of convergence in distribution. While we can see in panel (c) that the KL divergence does increase with $r$ , the value is still very small (0.012) even for $r=30$ for these values of $m$ and $n$ , however.

3.3 $\frac{m}{n}\rightarrow\alpha$

In this setting, whenever we fix $m$ , we choose the value $10^{6}$ , and $n=10^{6}$ . As usual, when we fix the other parameters, we choose $r,s=15$ and $p=0.5$ . The results may be seen in Figure 3. Again in this setting, the KL divergence remains nearly 0 regardless of the values of the parameters, reinforcing Theorem 1.

3.4 $\frac{m}{n}\rightarrow 0$

In this setting, whenever we fix $m$ , we choose the value $3.8\times 10^{6}$ , and $n=1.1\times 10^{9}$ . Unlike in previous cases, when the remaining parameters are fixed, we choose $r=15$ and $s=16$ , while $p=0.5$ . The results may be seen in Figure 4. The reason for the change in the default value for $s$ can be explained by considering panels (b) and (c) where $r$ and $s$ vary. We can see that the KL divergence spikes when $r=s$ : this comes from the fact that in this case, the hypothesized limiting distribution has 0 variance, so the Normal distribution we are comparing to collapses. Since for any finite $m$ and $n$ , the distribution of $R$ is not 0, the KL divergence is much larger at this point. Excluding this case, the KL divergences in these plots are all small, indicating close similarity between the simulated distributions and the proposed hypothetical distributions. We also note in panel (d) that increasing $m$ results in a worse KL divergence, but this also means that the ratio $m/n$ is increasing, meaning that we are further from the limiting regime we are considering in this case. The maximum value of the KL divergence is still quite small for this range of $m$ , however.

4 Conclusion

We studied the limiting distribution of a function of two independent Binomial random variables, in the setting where the number of trials for both variables grows large, but the rates of growth for those two quantities may differ. We were able to show that under several parameter regimes, the limiting distribution after centering and scaling is Normal, with a given variance. However, this does not exhaust the range of possible values for which one could consider this function: For example, in lubberts2025 , the authors considered the case where $m\sim n^{2}$ (for the special case of $r=1,s=1/2$ ), which is not addressed by our results here. While our results determine the appropriate values for the mean and variance of $R$ in the case when $m$ and $n$ are large, determining the behavior of higher moments would require a more careful analysis of the quadratic remainder term than the one undertaken in the present work. As a final remark, we note that even with random variables as well-studied as the Binomial, there still remain many interesting questions to explore.

References

[1] Kai Lai Chung. A course in probability theory. Elsevier, 2000.
[2] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012.
[3] Zachary Lubberts, Avanti Athreya, Youngser Park, and Carey E Priebe. Random line graphs and edge-attributed network inference. Bernoulli, 2025. arXiv:2103.14726.

	$\displaystyle\\|\nabla^{2}f(x,y)\\|_{2}$	$\displaystyle\leq\frac{x^{s-2}}{(x+y)^{r+2}}\left[\|s(s-1)(x+y)^{2}-2rsx(x+y)+r(r+1)x^{2}\|\right.$
		$\displaystyle\quad\left.+2\|r(r+1)x^{2}-rsx(x+y)\|+r(r+1)x^{2}\right].$		(2)