This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Existence of Firth’s modified estimates in binomial regression models

Mitsunori Ogawa  and Yui Tomo Interfaculty Initiative in Information Studies, The University of TokyoDepartment of Biostatistics, The University of TokyoDepartment of Clinical Data Science, National Center of Neurology and PsychiatryDepartment of Health Policy and Management, Keio University
Abstract

In logistic regression modeling, Firth’s modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth’s modified estimator can be formulated as a penalized maximum likelihood estimator in which Jeffreys’ prior is adopted as the penalty term. Despite its widespread use in practice, the formal verification of the corresponding estimate’s existence has not been established. In this study, we establish the existence theorem of Firth’s modified estimate in binomial logistic regression models, assuming only the full column rankness of the design matrix. We also discuss other binomial regression models obtained through alternating link functions and prove the existence of similar penalized maximum likelihood estimates for such models.

1 Introduction

The logistic regression model is one of the most fundamental models in generalized linear models, and maximum likelihood estimation is a standard method for parameter estimation. However, it is widely known that the maximum likelihood estimate in a logistic regression model does not exist when data separation occurs (Albert and Anderson (1984)). Roughly speaking, this happens when a separation hyperplane exists that separates the data points according to their categories. Firth’s method is often used to address this issue (Firth (1993); Heinze and Schemper (2002)). In practice, it has been applied in various fields, including social sciences (Bandyopadhyay and Green (2013); Bhavnani (2009)) and medical sciences (Chatsirisupachai et al. (2021); Bambauer et al. (2006); Teoh et al. (2020)).

Firth’s method was originally proposed as a bias correction method for maximum likelihood estimators in Firth (1993). In this method, the asymptotically bias-corrected estimator is defined as the solution to the modified score equation. It can also be formulated as a penalized maximum likelihood estimator, in which Jeffreys’ prior is adopted as the penalty term. In both formulations, Firth’s method outputs a bias-corrected estimate directly, without computing the original maximum likelihood estimate. This is a distinctive feature not found in typical bias correction methods, where the maximum likelihood estimate is corrected using a bias correction term. This feature allows us to expect Firth’s modified estimate to exist even when the maximum likelihood estimate does not exist. Some theoretical properties of the Jeffreys-prior penalty term have been investigated; for example, see Chen et al. (2008); Kosmidis and Firth (2009, 2021).

Firth’s method has been applied to various regression models to obtain an estimator for which the corresponding estimate is ensured to exist even when the original one does not. Heinze and Schemper (2002) and Heinze and Schemper (2001) applied Firth’s method to the binomial logistic regression model and the Cox proportional hazards model, respectively. Bull et al. (2002) considered the multinomial logistic regression model and proposed a similar solution to the separation problem. Joshi et al. (2022) discussed the Poisson regression case and explored some modified estimation methods. Additionally, Alam et al. (2022) investigated the application of Firth’s method to accelerated failure time models.

Despite the widespread use of Firth’s method in methodological studies of regression models and their applications in practice, the validity of the existence of Firth’s modified estimate has not been fully verified. Firth (1993) yielded an intuitive argument on the existence of Firth’s estimate in the binomial logistic regression model, which claimed that the Jeffreys-prior penalty term is unbounded below as the parameter diverges without any formal proof. Heinze and Schemper (2002) demonstrated the validity of existence through extensive numerical experiments. Recently, Kosmidis and Firth (2021) discussed the theoretical properties of Firth’s estimate in binomial regression models. They examined the finiteness through the examination of the divergence of the Jeffreys-prior penalty term. However, their discussion implicitly assumed the existence in a vague sense and was insufficient as formal proof. Overall, although the existence of Firth’s modified estimate is empirically evident, more formal theoretical justification is still required.

To fill this gap, we establish the existence theorem of Firth’s modified estimate in the binomial logistic regression model, assuming only the full column rankness of the design matrix. Our proof is consistent with the intuitive argument in Firth (1993). We also discuss some binomial regression models other than the logistic regression model obtained through alternating link functions and derive similar existence results of the corresponding penalized maximum likelihood estimates for such models.

2 Existence guarantee in the logistic regression case

Assume that we have nn realizations y1,,yny_{1},\dots,y_{n} that are generated independently from Bin(mi,πi(β))\mathop{\operatorname{Bin}}(m_{i},\pi_{i}(\beta)) (i=1,,ni=1,\dots,n). Here, mim_{i} is a given positive integer and πi(β)\pi_{i}(\beta) is a probability determined by the following logistic model:

πi(β)=exp(xiβ)1+exp(xiβ),\pi_{i}(\beta)=\frac{\exp(x_{i}^{\top}\beta)}{1+\exp(x_{i}^{\top}\beta)}, (1)

where xi=(xi1,,xip)px_{i}=(x_{i1},\dots,x_{ip})^{\top}\in{\mathbb{R}}^{p} is the covariate vector of subject ii and β=(β1,,βp)p\beta=(\beta_{1},\dots,\beta_{p})^{\top}\in{\mathbb{R}}^{p} is a parameter vector. Throughout this study, we assume that npn\geq p. The log-likelihood function is

l(β)=i=1n[yixiβmilog{1+exp(xiβ)}]\displaystyle l(\beta)=\sum_{i=1}^{n}\left[y_{i}x_{i}^{\top}\beta-m_{i}\log\left\{1+\exp(x_{i}^{\top}\beta)\right\}\right]

up to an additive constant irrelevant to β\beta. The Hessian matrix of l(β)l(\beta) is XMW(β)X-X^{\top}MW(\beta)X, where X=(x1,,xn)n×pX=(x_{1},\dots,x_{n})^{\top}\in{\mathbb{R}}^{n\times p}, M=diag(m1,,mn)M=\operatorname{diag}(m_{1},\dots,m_{n}), and W(β)=diag{w1(β),,wn(β)}W(\beta)=\operatorname{diag}\{w_{1}(\beta),\dots,w_{n}(\beta)\} with wi(β)=πi(β){1πi(β)}=exp(xiβ)/{1+exp(xiβ)}2w_{i}(\beta)=\pi_{i}(\beta)\{1-\pi_{i}(\beta)\}=\exp(x_{i}^{\top}\beta)/\{1+\exp(x_{i}^{\top}\beta)\}^{2}. Firth’s penalized log-likelihood function is

l(β):=l(β)+12log|XMW(β)X|.\displaystyle l^{*}(\beta):=l(\beta)+\frac{1}{2}\log|X^{\top}MW(\beta)X|. (2)

We define the corresponding estimator as the maximizer of the penalized log-likelihood, that is, a solution β^p\hat{\beta}\in{\mathbb{R}}^{p} satisfying

l(β^)=supβpl(β).\displaystyle l^{*}(\hat{\beta})=\sup_{\beta\in{\mathbb{R}}^{p}}l^{*}(\beta). (3)

We refer to this estimator as Firth’s modified estimator.

The main result of this study is the following theorem, which confirms the existence of Firth’s modified estimates in the binomial logistic regression model.

Theorem 1.

If XX is of full column rank, there exists a maximizer β^p\hat{\beta}\in{\mathbb{R}}^{p} of ll^{*} and the set of maximizers argmaxβpl(β)\operatorname{argmax}_{\beta\in{\mathbb{R}}^{p}}l^{*}(\beta) of ll^{*} is bounded.

Proof of Theorem 1.

Let c:=l(0)c:=l^{*}(0)\in{\mathbb{R}}. From Lemma 2, we can find a constant r0>0r_{0}>0 such that supup:u=1l(ru)<c\sup_{u\in{\mathbb{R}}^{p}:\|u\|=1}l^{*}(ru)<c for all r>r0r>r_{0}. Let (r0):={βp:βr0}{\mathcal{B}}(r_{0}):=\{\beta\in{\mathbb{R}}^{p}:\|\beta\|\leq r_{0}\} be a closed ball with center 0 and radius r0r_{0}. The restriction l:(r0)l^{*}:{\mathcal{B}}(r_{0})\to{\mathbb{R}} of l(β)l^{*}(\beta) has a maximizer β(r0)\beta^{*}\in{\mathcal{B}}(r_{0}), because l(β)l^{*}(\beta) is continuous and (r0){\mathcal{B}}(r_{0}) is compact. Since l(β)l^{*}(\beta) is lower than cc outside (r0){\mathcal{B}}(r_{0}), β\beta^{*} is a maximizer of l:pl^{*}:{\mathbb{R}}^{p}\to{\mathbb{R}}. We see that the set argmaxβpl(β)\operatorname{argmax}_{\beta\in{\mathbb{R}}^{p}}l^{*}(\beta) is bounded because any maximizer must be included in (r0){\mathcal{B}}(r_{0}). ∎

Lemma 2.

If XX has a full column rank, supup:u=1|XMW(ru)X|0\sup_{u\in{\mathbb{R}}^{p}:\|u\|=1}|X^{\top}MW(ru)X|\to 0 as rr\to\infty.

Proof.

Let 𝒰:={up:u=1}{\mathcal{U}}:=\{u\in{\mathbb{R}}^{p}:\|u\|=1\} be the unit sphere. For any r>0r>0 and u𝒰u\in{\mathcal{U}}, we observe that

wi(ru)=1{1+exp(rxiu)}{1+exp(rxiu)}11+exp(r|xiu|)=11+exp(rxi|cosθi(u)|),\displaystyle w_{i}(ru)=\frac{1}{\{1+\exp(rx_{i}^{\top}u)\}\{1+\exp(-rx_{i}^{\top}u)\}}\leq\frac{1}{1+\exp(r|x_{i}^{\top}u|)}=\frac{1}{1+\exp(r\|x_{i}\|\cdot|\cos\theta_{i}(u)|)}, (4)

where θi(u)\theta_{i}(u) denotes the angle between xix_{i} and uu.

First, we consider the case n=pn=p. Let w(r):=supu𝒰i=1nwi(ru)w(r):=\sup_{u\in{\mathcal{U}}}\prod_{i=1}^{n}w_{i}(ru) for r>0r>0. We then have

w(r)\displaystyle w(r) supu𝒰i=1n{1+exp(rxi|cosθi(u)|)}1\displaystyle\leq\sup_{u\in{\mathcal{U}}}\prod_{i=1}^{n}\{1+\exp(r\|x_{i}\|\cdot|\cos\theta_{i}(u)|)\}^{-1}
supu𝒰i=1n{1+exp(ra|cosθi(u)|)}1\displaystyle\leq\sup_{u\in{\mathcal{U}}}\prod_{i=1}^{n}\{1+\exp(ra|\cos\theta_{i}(u)|)\}^{-1}
supu𝒰i=1nexp{ra|cosθi(u)|}\displaystyle\leq\sup_{u\in{\mathcal{U}}}\prod_{i=1}^{n}\exp\{-ra|\cos\theta_{i}(u)|\}
=exp[rainfu𝒰{i=1n|cosθi(u)|}],\displaystyle=\exp\left[-ra\inf_{u\in{\mathcal{U}}}\left\{\sum_{i=1}^{n}|\cos\theta_{i}(u)|\right\}\right],

where a:=min{xi:i=1,,n}a:=\min\{\|x_{i}\|:i=1,\dots,n\}. Because 𝒰{\mathcal{U}} is compact and i|cosθi(u)|\sum_{i}|\cos\theta_{i}(u)| is continuous in uu, a minimum c:=minu𝒰i|cosθi(u)|0c:=\min_{u\in{\mathcal{U}}}\sum_{i}|\cos\theta_{i}(u)|\geq 0 exists. Since XX is of full column rank, we have c>0c>0. Therefore, it follows that w(r)exp(acr)0w(r)\leq\exp(-acr)\to 0 as rr\to\infty. Since supu𝒰|XMW(ru)X|=|X|2|M|supu𝒰|W(ru)|=|X|2|M|w(r)0\sup_{u\in{\mathcal{U}}}|X^{\top}MW(ru)X|=|X|^{2}|M|\sup_{u\in{\mathcal{U}}}|W(ru)|=|X|^{2}|M|w(r)\geq 0, we obtain supu𝒰|XMW(ru)X|0\sup_{u\in{\mathcal{U}}}|X^{\top}MW(ru)X|\to 0 as rr\to\infty.

Subsequently, we consider the case n>pn>p. Using the Binet–Cauchy formula, |XMW(β)X||X^{\top}MW(\beta)X| can be written as

|XMW(β)X|\displaystyle|X^{\top}MW(\beta)X| =1i1<i2<<ipn|X|i1,i2,,ip1,2,,p|MWX|1,2,,pi1,i2,,ip\displaystyle=\sum_{1\leq i_{1}<i_{2}<\dots<i_{p}\leq n}|X^{\top}|_{i_{1},i_{2},\dots,i_{p}}^{1,2,\dots,p}\cdot|MWX|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}
=1i1<i2<<ipn(|X|1,2,,pi1,i2,,ip)2k=1pmikwik(β)0,\displaystyle=\sum_{1\leq i_{1}<i_{2}<\dots<i_{p}\leq n}\left(|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\right)^{2}\prod_{k=1}^{p}m_{i_{k}}w_{i_{k}}(\beta)\geq 0,

where |A|j1,j2,,jp1,2,,p|A|_{j_{1},j_{2},\dots,j_{p}}^{1,2,\dots,p} is the minor determinant of a matrix An×pA\in{\mathbb{R}}^{n\times p} obtained by taking its j1,,jpj_{1},\dots,j_{p}-th rows and |B|1,2,,pj1,j2,,jp|B|^{j_{1},j_{2},\dots,j_{p}}_{1,2,\dots,p} is the minor determinant of a matrix Bp×nB\in{\mathbb{R}}^{p\times n} obtained by taking its j1,,jpj_{1},\dots,j_{p}-th columns. Using this expression, we can write as

supu𝒰|XMW(ru)X|\displaystyle\sup_{u\in{\mathcal{U}}}|X^{\top}MW(ru)X| =supu𝒰1i1<i2<<ipn(|X|1,2,,pi1,i2,,ip)2k=1pmikwik(ru)\displaystyle=\sup_{u\in{\mathcal{U}}}\sum_{1\leq i_{1}<i_{2}<\dots<i_{p}\leq n}\left(|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\right)^{2}\prod_{k=1}^{p}m_{i_{k}}w_{i_{k}}(ru)
=1i1<i2<<ipn(|X|1,2,,pi1,i2,,ip)2(k=1pmik)supu𝒰k=1pwik(ru).\displaystyle=\sum_{1\leq i_{1}<i_{2}<\dots<i_{p}\leq n}\left(|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\right)^{2}\left(\prod_{k=1}^{p}m_{i_{k}}\right)\sup_{u\in{\mathcal{U}}}\prod_{k=1}^{p}w_{i_{k}}(ru).

Using the discussion in the previous paragraph, for each term with |X|1,2,,pi1,i2,,ip0|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\neq 0 on the right-hand side, we find a constant ci1,,ip>0c_{i_{1},\dots,i_{p}}>0 such that supuk=1pwik(ru)exp(ci1,,ipr)\sup_{u}\prod_{k=1}^{p}w_{i_{k}}(ru)\leq\exp(-c_{i_{1},\dots,i_{p}}r). Thus, letting c:=min{ci1,,ip:1i1<i2<<ipn,|X|1,2,,pi1,i2,,ip0}c^{*}:=\min\{c_{i_{1},\dots,i_{p}}:1\leq i_{1}<i_{2}<\dots<i_{p}\leq n,~{}|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\neq 0\}, it follows that

supu𝒰|XW(tu)X|{1i1<i2<<ipn(|X|1,2,,pi1,i2,,ip)2k=1pmik}exp(cr)0as r.\displaystyle\sup_{u\in{\mathcal{U}}}|X^{\top}W(tu)X|\leq\left\{\sum_{1\leq i_{1}<i_{2}<\dots<i_{p}\leq n}\left(|X|^{i_{1},i_{2},\dots,i_{p}}_{1,2,\dots,p}\right)^{2}\prod_{k=1}^{p}m_{i_{k}}\right\}\exp(-c^{*}r)\to 0\quad\textnormal{as $r\to\infty$}.

This completes the proof. ∎

To our knowledge, Theorem 1 is the first result that formally ensures the existence of Firth’s modified estimate in logistic regression models. As mentioned in Section 1, our proof of existence is consistent with the intuitive discussion in Firth (1993) that claims the divergence of the penalty term to -\infty when some entries of β\beta diverge. Kosmidis and Firth (2021) also discussed the finiteness of the estimate in their Corollary 1, based on a weaker version of Lemma 2. However, the uniformity of the bounds for the Jeffreys-prior penalty term was missing from their discussion. Lemma 2 fills this gap and allows us to formally establish an existence guarantee.

3 Binomial regression models specified with other link functions

The logistic regression model (1) is the binomial regression model specified by the logit link function in the framework of generalized linear models. The probit and complementary log-log link functions are other common options of the link functions in binomial regression models. For the corresponding models, we can consider the Jeffreys-prior penalized maximum likelihood estimators of the form (3), where l(β)l(\beta) and W(β)W(\beta) in (2) are replaced appropriately. Specifically, the diagonal elements of W(β)W(\beta) are

wi(β)={{ϕ(z)}2/[Φ(z){1Φ(z)}]probit regressionexp(2z)/{exp(exp(z))1}complementary log-log regression,\displaystyle w_{i}(\beta)=\begin{cases}\{\phi(z)\}^{2}/[\Phi(z)\{1-\Phi(z)\}]&\quad\textnormal{probit regression}\\ \exp(-2z)/\{\exp(\exp(-z))-1\}&\quad\textnormal{complementary log-log regression},\end{cases}

where ϕ()\phi(\cdot) and Φ()\Phi(\cdot) are the cumulative distribution and probability density functions of the standard normal distribution, respectively. Note that these penalized maximum likelihood estimators are different from the estimators obtained by Firth’s method because the link functions are not canonical. The following proposition yields the existence guarantee of the corresponding estimates.

Proposition 3.

For the probit and complementary log-log link functions the claims of Theorem 1 hold, that is, if XX is of full column rank, there exists a maximizer β^p\hat{\beta}\in{\mathbb{R}}^{p} of the corresponding ll^{*} and the set of maximizers argmaxβpl(β)\operatorname{argmax}_{\beta\in{\mathbb{R}}^{p}}l^{*}(\beta) of ll^{*} is bounded.

Proof.

By using Lemmas A.1 and A.2, for the probit and complementary log-log regression models, we can find a positive constant cc such that

wi(ru)c1+exp(r|xiu|)w_{i}(ru)\leq\frac{c}{1+\exp(r|x_{i}^{\top}u|)}

for any r>0r>0 and u𝒰u\in{\mathcal{U}}. Using this inequation instead of (4), the claim of Proposition 3 can be proved according to the proof of Theorem 1. ∎

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP20K19752.

Appendix Appendix A Lemmas used in the proof of Proposition 3

The following lemmas are used in the proof of Proposition 3.

Lemma A.1.

There exists a constant cc\in{\mathbb{R}} such that

f(z):=(1+e|z|){ϕ(z)}2Φ(z){1Φ(z)}<c,z.f(z):=(1+e^{|z|})\frac{\{\phi(z)\}^{2}}{\Phi(z)\{1-\Phi(z)\}}<c,\quad z\in{\mathbb{R}}.
Proof.

Since the function on the left-hand side is even, we assume z>0z>0 without loss of generality. Because Φ(z)>1/2\Phi(z)>1/2 and 1Φ(z)>zz2+1ϕ(z)1-\Phi(z)>\frac{z}{z^{2}+1}\phi(z), we have

f(z)<(1+e|z|)ϕ(z)2(z2+1)z0.f(z)<(1+e^{|z|})\phi(z)\frac{2(z^{2}+1)}{z}\to 0.

Then, for any ε>0\varepsilon>0 we can take δ>0\delta>0 such that f(z)<εf(z)<\varepsilon for z>δz>\delta. Therefore we have f(z)max{maxs[0,δ]f(s),ε}<f(z)\leq\max\{\max_{s\in[0,\delta]}f(s),\varepsilon\}<\infty. ∎

Lemma A.2.

There exists a constant cc\in{\mathbb{R}} such that

f(z):=(1+e|z|)exp(2z)exp(exp(z))1<c,z.f(z):=(1+e^{|z|})\frac{\exp(-2z)}{\exp(\exp(-z))-1}<c,\quad z\in{\mathbb{R}}.
Proof.

Since ez>1+z+z2/2+z3/6e^{z}>1+z+z^{2}/2+z^{3}/6 for z>0z>0, we have

(1+e|z|)exp(2z)exp(exp(z))1<(1+e|z|)e2zez+e2z/2+e3z/6=(1+e|z|)ez1+ez/2+e2z/6=:g(z).(1+e^{|z|})\frac{\exp(-2z)}{\exp(\exp(-z))-1}<(1+e^{|z|})\frac{e^{-2z}}{e^{-z}+e^{-2z}/2+e^{-3z}/6}=(1+e^{|z|})\frac{e^{-z}}{1+e^{-z}/2+e^{-2z}/6}=:g(z).

For z>0z>0,

g(z)=1+ez1+ez/2+e2z/61as z.g(z)=\frac{1+e^{-z}}{1+e^{-z}/2+e^{-2z}/6}\to 1\quad\textnormal{as $z\to\infty$}.

For z<0z<0,

g(z)=1+eze2z+ez/2+1/66as z.g(z)=\frac{1+e^{z}}{e^{2z}+e^{z}/2+1/6}\to 6\quad\textnormal{as $z\to-\infty$}.

Thus, for any ε<0\varepsilon<0, there exists δ>0\delta>0 such that g(z)<6+εg(z)<6+\varepsilon for any |z|>δ|z|>\delta. Therefore we have f(z)<g(z)max{6+ε,maxs:|s|δg(s)}<f(z)<g(z)\leq\max\{6+\varepsilon,\max_{s:|s|\leq\delta}g(s)\}<\infty. ∎

References

  • Alam et al. (2022) Alam, T. F., Rahman, M. S., and Bari, W. “On estimation for accelerated failure time models with small or rare event survival data.” BMC Med. Res. Methodol., 22(1):169 (2022).
  • Albert and Anderson (1984) Albert, A. and Anderson, J. A. “On the existence of maximum likelihood estimates in logistic regression models.” Biometrika, 71(1):1–10 (1984).
  • Bambauer et al. (2006) Bambauer, K. Z., Zhang, B., Maciejewski, P. K., Sahay, N., Pirl, W. F., Block, S. D., and Prigerson, H. G. “Mutuality and specificity of mental disorders in advanced cancer patients and caregivers.” Soc. Psychiatry Psychiatr. Epidemiol., 41(10):819–824 (2006).
  • Bandyopadhyay and Green (2013) Bandyopadhyay, S. and Green, E. “Nation-building and vonflict in modern Africa.” World Dev., 45:108–118 (2013).
  • Bhavnani (2009) Bhavnani, R. R. “Do electoral quotas work after they are withdrawn? Evidence from a natural experiment in India.” Am. Polit. Sci. Rev., 103(1):23–35 (2009).
  • Bull et al. (2002) Bull, S. B., Mak, C., and Greenwood, C. M. T. “A modified score function estimator for multinomial logistic regression in small samples.” Comput. Stat. Data Anal., 39(1):57–74 (2002).
  • Chatsirisupachai et al. (2021) Chatsirisupachai, K., Lesluyes, T., Paraoan, L., Van Loo, P., and de Magalhães, J. P. “An integrative analysis of the age-associated multi-omic landscape across cancers.” Nat. Commun., 12(1):2345 (2021).
  • Chen et al. (2008) Chen, M.-H., Ibrahim, J. G., and Kim, S. “Properties and implementation of Jeffreys’s prior in binomial regression models.” J. Am. Stat. Assoc., 103(484):1659–1664 (2008).
  • Firth (1993) Firth, D. “Bias reduction of maximum likelihood estimates.” Biometrika, 80(1):27–38 (1993).
  • Heinze and Schemper (2001) Heinze, G. and Schemper, M. “A solution to the problem of monotone likelihood in Cox regression.” Biometrics, 57(1):114–119 (2001).
  • Heinze and Schemper (2002) —. “A solution to the problem of separation in logistic regression.” Stat. Med., 21(16):2409–2419 (2002).
  • Joshi et al. (2022) Joshi, A., Geroldinger, A., Jiricka, L., Senchaudhuri, P., Corcoran, C., and Heinze, G. “Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression.” Stat. Methods Med. Res., 31(2):253–266 (2022).
  • Kosmidis and Firth (2009) Kosmidis, I. and Firth, D. “Bias reduction in exponential family nonlinear models.” Biometrika, 96(4):793–804 (2009).
  • Kosmidis and Firth (2021) —. “Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models.” Biometrika, 108(1):71–82 (2021).
  • Teoh et al. (2020) Teoh, A. Y. B., Kitano, M., Itoi, T., Pérez-Miranda, M., Ogura, T., Chan, S. M., Serna-Higuera, C., Omoto, S., Torres-Yuste, R., Tsuichiya, T., Wong, K. T., Leung, C.-H., Chiu, P. W. Y., Ng, E. K. W., and Lau, J. Y. W. “Endosonography-guided gallbladder drainage versus percutaneous cholecystostomy in very high-risk surgical patients with acute cholecystitis: an international randomised multicentre controlled superiority trial (DRAC 1).” Gut, 69(6):1085–1091 (2020).