This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of γj\gamma_{j}

Hung-Hsuan Chen
(Department of Computer Science and Information Engineering
National Central University
hhchen1105@acm.org )
Abstract

The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC’s training and prediction processes, focusing on the computation of terminal node values γj\gamma_{j}, which are crucial to optimizing the logistic loss function. We derive γj\gamma_{j} through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm’s implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.

1 Introduction

The gradient boosting machine (GBM) [Friedman, 2001, Hastie et al., 2009] is a robust predictive model for tabular datasets. GBM constructs weak learners (trees) sequentially; each tree predicts the residual of the earlier predictions.

When applying GBM to regression tasks, the training process of each tree is very similar to that of a standard decision tree. However, when applying GBM on binary classification tasks (we call the Gradient Boosting Classifier method, or GBC, below), the prediction of a terminal node jj of a tree TmT_{m} is not the average residual of the training instances that fall at node jj. Instead, the prediction of the terminal node jj in the tree TmT_{m} is given by a mysterious equation below.

γj=iΩm,jrm(i)iΩm,jpm1(i)(1pm1(i)),\gamma_{j}=\frac{\sum_{i\in\Omega_{m,j}}r_{m}^{(i)}}{\sum_{i\in\Omega_{m,j}}p_{m-1}^{(i)}(1-p_{m-1}^{(i)})}, (1)

where pm1(i)p_{m-1}^{(i)} is the estimated probability that the iith instance’s target value is 1 in the previous iteration, rm(i)=y(i)pm1(i)r_{m}^{(i)}=y^{(i)}-p_{m-1}^{(i)} is the residual of instance ii that needs to be fit/predict in iteration mm, and Ωm,j\Omega_{m,j} is the set of instance indices located in the terminal node jj for tree TmT_{m}.

In the next section, we introduce the GBC training and prediction procedure, followed by a clear explanation of how γj\gamma_{j} is derived in Section 3. We give a step-by-step example in the appendix in Section A.

2 GBC training and prediction

input : Input features 𝒙(1:n)n×d\bm{x}^{(1:n)}\in\mathbb{R}^{n\times d} and targets y(1:n){0,1}ny^{(1:n)}\in\{0,1\}^{n}
input : Number of Trees: MM
input : Learning rate: η\eta
output : Fitted trees: T1:MT_{1:M}
1
// Initialization
2 for i1i\leftarrow 1 to nn do
3       F0(i)0F_{0}^{(i)}\leftarrow 0;
4       p0(i)σ(F0(i))=0.5p_{0}^{(i)}\leftarrow\sigma\left(F_{0}^{(i)}\right)=0.5;
5      
6
// Training for MM iterations
7 for m1m\leftarrow 1 to MM do
8       for i1i\leftarrow 1 to nn do
9             Compute residual: rm(i)y(i)pm1(i)r_{m}^{{(i)}}\leftarrow y^{(i)}-p_{m-1}^{(i)};
10            
11      
12      Train a regression tree TmT_{m} to fit the dataset: {(𝒙(i),rm(i))}i=1,,n\{(\bm{x}^{(i)},r_{m}^{(i)})\}_{i=1,\ldots,n};
13      
      // Update terminal node values
14       Let Ωm,j\Omega_{m,j} be the set of instance indices in terminal node jj of TmT_{m};
15       for each terminal node jTmj\in T_{m} do
16             Set the output of node jj in TmT_{m}: γjkΩm,jrm(k)kΩm,jpm1(k)(1pm1(k))\gamma_{j}\leftarrow\frac{\sum_{k\in\Omega_{m,j}}r_{m}^{(k)}}{\sum_{k\in\Omega_{m,j}}p_{m-1}^{(k)}(1-p_{m-1}^{(k)})};
17             for kΩm,jk\in\Omega_{m,j} do
18                   Fm(k)Fm1(k)+ηγjF_{m}^{(k)}\leftarrow F_{m-1}^{(k)}+\eta\gamma_{j};
19                   pm(k)σ(Fm(k))p_{m}^{(k)}\leftarrow\sigma\left(F_{m}^{(k)}\right);
20                  
21            
22      
Algorithm 1 Gradient Boosting Classifier Training

2.1 Training process

The training of a Gradient Boosting Classifier begins with an initial model, which is typically a constant prediction for all instances. In binary classification, the model is initialized with a raw predicted logarithm of odds (called log odds below) F0(i)F_{0}^{(i)} set to 0 for each instance ii, which corresponds to an initial predicted probability of 0.5 for all instances (since p0(i)=σ(F0(i))p_{0}^{(i)}=\sigma(F_{0}^{(i)}), where σ\sigma is the logistic function). This represents a neutral starting point since the model does not have prior information about the data.

The algorithm iteratively proceeds over MM boosting iterations. At each iteration mm, the model computes the residuals for each training instance. The residuals represent how far the current predicted probability is from the true label y(i)y^{(i)}. Specifically, the residual rm(i)r_{m}^{(i)} is computed as the difference between the true label y(i)y^{(i)} and the predicted probability pm1(i)p_{m-1}^{(i)} from the previous iteration. These residuals are used to train the decision tree TmT_{m}, where the tree is fitted to predict the residuals, thus focusing the model on correcting the errors made by the previous trees. The tree training strategy could be based on metrics such as information gain, Gini-index, or mean squared error, guiding the tree to select the optimal feature and threshold for splitting [Han et al., 2012].

Once the tree TmT_{m} is trained, the algorithm computes the optimal values for the terminal nodes. For each terminal node jj, the output value γj\gamma_{j} is calculated by minimizing the logistic loss for the instances that fall into that node. This is done using Equation 1; the reasons will be explained in Section 3.

Next, for each instance kk that falls into terminal node jj, the model prediction is updated by adding the scaled node value γj\gamma_{j} to the previous prediction Fm1(k)F_{m-1}^{(k)}, with a scaling factor given by the learning rate η\eta. This update is applied as follows:

Fm(k)Fm1(k)+ηγjF_{m}^{(k)}\leftarrow F_{m-1}^{(k)}+\eta\gamma_{j} (2)

The predicted probability of instance kk is then updated by applying the logistic function to the new raw score Fm(k)F_{m}^{(k)}:

pm(k)=σ(Fm(k)).p_{m}^{(k)}=\sigma(F_{m}^{(k)}). (3)

The algorithm builds trees over MM iterations, incrementally refining the model’s predictions by reducing residual errors from previous iterations. At the end of the training process, we have an ensemble of MM decision trees T1:MT_{1:M}, each contributing to the final prediction.

Algorithm 1 shows the pseudocode for training.

2.2 Prediction process

input : Test instances 𝒙(1:t)t×d\bm{x}^{(1:t)}\in\mathbb{R}^{t\times d}
input : Learning rate: η\eta
input : Trained trees: T1:MT_{1:M}
output : Predicted probabilities p(1:t)p^{(1:t)} for class 1
1 for i1i\leftarrow 1 to tt do
2       F(i)0F^{(i)}\leftarrow 0;
3       for m1m\leftarrow 1 to MM do
             F(i)F(i)+ηTm(𝒙(i))F^{(i)}\leftarrow F^{(i)}+\eta T_{m}(\bm{x}^{(i)});
              // Tm(𝒙(i))T_{m}(\bm{x}^{(i)}) is TmT_{m}’s prediction on 𝒙(i)\bm{x}^{(i)}
4            
5      p(i)σ(F(i))=11+eF(i)p^{(i)}\leftarrow\sigma\left(F^{(i)}\right)=\frac{1}{1+e^{-F^{(i)}}};
6      
Algorithm 2 Gradient Boosting Classifier Prediction

Once the Gradient Boosting Classifier is trained, GBC predicts the labels of new test instances as follows: The prediction process starts by initializing the raw prediction F(i)F^{(i)} for each test instance ii to 0, just as in the training phase. The algorithm then iteratively applies each of the MM trees to the test instance, adding the scaled output of each tree to the current raw score.

Specifically, for each tree TmT_{m}, the prediction for a test instance 𝒙(i)\bm{x}^{(i)} is obtained by evaluating the decision tree. The output value, Tm(𝒙(i))T_{m}(\bm{x}^{(i)}), is then scaled by the learning rate η\eta and added to the current raw score F(i)F^{(i)} as:

F(i)=F(i)+ηTm(𝒙(i))F^{(i)}=F^{(i)}+\eta T_{m}(\bm{x}^{(i)}) (4)

This process is repeated for each tree from T1T_{1} to TMT_{M}, accumulating the contributions of all trees to the final raw prediction FiF_{i}.

Once all trees have been evaluated, the final predicted probability for the test instance 𝒙(i)\bm{x}^{(i)} is computed by applying the logistic function to the raw score F(i)F^{(i)}:

p(i)=σ(F(i))=11+eF(i)p^{(i)}=\sigma(F^{(i)})=\frac{1}{1+e^{-F^{(i)}}} (5)

The output is a probability score for each test instance, indicating the confidence of the model in predicting class 1. This probability can be thresholded (e.g., using a threshold of 0.5) to produce binary class predictions.

Much like the training process, this prediction process is designed to leverage the additive nature of Gradient Boosting, where the final prediction is built up from the contributions of many weak learners.

The pseudocode is shown in Algorithm 2.

3 Where does γj\gamma_{j} come from?

Assume that GBC has been trained for m1m-1 iterations, so Fm1(𝒙(i))F_{m-1}(\bm{x}^{(i)}) is fixed. GBC attempts to train a new weak learner TmT_{m} that has JJ terminal node predictions γ1,,γJ\gamma_{1},\ldots,\gamma_{J}. The new predicted logarithm of odds Fm(𝒙(i))F_{m}(\bm{x}^{(i)}) is given by

Fm(𝒙(i))=Fm1(𝒙(i))+γj,F_{m}(\bm{x}^{(i)})=F_{m-1}(\bm{x}^{(i)})+\gamma_{j}, (6)

where jj is the index of the terminal node that 𝒙(i)\bm{x}^{(i)} falls in.

The predicted probability is

pm(i)=σ(Fm(𝒙(i)))=11+eFm(𝒙(i))=11+e(Fm1(𝒙(i))+γj)p_{m}^{(i)}=\sigma\left(F_{m}(\bm{x}^{(i)})\right)=\frac{1}{1+e^{-F_{m}(\bm{x}^{(i)})}}=\frac{1}{1+e^{-\left(F_{m-1}(\bm{x}^{(i)})+\gamma_{j}\right)}} (7)

Let 𝒚Ωm,j\bm{y}^{\Omega_{m,j}} and 𝒑mΩm,j\bm{p}_{m}^{\Omega_{m,j}} denote the ground truth labels and predicted probabilities after iteration mm for all the nodes that fall in the terminal node jj of the tree TmT_{m}, we express the cross entropy loss for these instances as a function of γj\gamma_{j}:

f(γj)=(𝒚Ωm,j,𝒑mΩm,j)=kΩm,j(y(k)logpm(k)+(1y(k))log(1pm(k)))=kΩm,j(y(k)logpm(k)log(1pm(k))+y(k)log(1pm(k)))=kΩm,j(y(k)logpm(k)1pm(k)log(1pm(k)))=kΩm,j(y(k)log(1eFm(𝒙(i)))log(111+eFm(𝒙(k))))=kΩm,j(y(k)Fm(𝒙(i))log(eFm(𝒙(k))1+eFm(𝒙(k))))=kΩm,j(y(k)Fm(𝒙(i))log(11+eFm(𝒙(k))))=kΩm,j(y(k)Fm(𝒙(i))+log(1+eFm(𝒙(k))))=kΩm,j(y(k)(Fm1(𝒙(i))+γj)+log(1+e(Fm1(𝒙(k))+γj)))\begin{split}f(\gamma_{j})&=\mathcal{L}(\bm{y}^{\Omega_{m,j}},\bm{p}_{m}^{\Omega_{m,j}})=-\sum_{\forall k\in\Omega_{m,j}}\left(y^{(k)}\log p_{m}^{(k)}+(1-y^{(k)})\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log p_{m}^{(k)}-\log(1-p_{m}^{(k)})+y^{(k)}\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log\frac{p_{m}^{(k)}}{1-p_{m}^{(k)}}-\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log\left(\frac{1}{e^{-F_{m}(\bm{x}^{(i)})}}\right)-\log\left(1-\frac{1}{1+e^{-F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})-\log\left(\frac{e^{-F_{m}(\bm{x}^{(k)})}}{1+e^{-F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})-\log\left(\frac{1}{1+e^{F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})+\log\left(1+e^{F_{m}(\bm{x}^{(k)})}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\left(F_{m-1}(\bm{x}^{(i)})+\gamma_{j}\right)+\log\left(1+e^{\left(F_{m-1}(\bm{x}^{(k)})+\gamma_{j}\right)}\right)\right)\end{split} (8)

We look for the γj\gamma_{j} value to minimize the cross-entropy loss.

3.1 Trial 1: set the derivative to zero and solve the equation

The loss function can be considered a function of γj\gamma_{j} since γj\gamma_{j} is the only variable in the function. We take the derivative of the loss with respect to γj\gamma_{j} and set it to zero:

f(γj)γj=(𝒚Ωm,j,𝒑mΩm,j)γj=kΩm,j(y(k)+eFm1(𝒙(k))+γj1+eFm1(𝒙(k))+γj)=kΩm,j(y(k))+kΩm,j(11+e(Fm1(𝒙(k))+γj)):=0\begin{split}\frac{f(\gamma_{j})}{\partial\gamma_{j}}&=\frac{\partial\mathcal{L}(\bm{y}^{\Omega_{m,j}},\bm{p}_{m}^{\Omega_{m,j}})}{\partial\gamma_{j}}=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+\frac{e^{F_{m-1}(\bm{x}^{(k)})+\gamma_{j}}}{1+e^{F_{m-1}(\bm{x}^{(k)})+\gamma_{j}}}\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\right)+\sum_{\forall k\in\Omega_{m,j}}\left(\frac{1}{1+e^{-(F_{m-1}(\bm{x}^{(k)})+\gamma_{j})}}\right):=0\end{split}
kΩm,j(11+e(Fm1(𝒙(k))+γj))=kΩm,jy(k)\Rightarrow\sum_{\forall k\in\Omega_{m,j}}\left(\frac{1}{1+e^{-(F_{m-1}(\bm{x}^{(k)})+\gamma_{j})}}\right)=\sum_{\forall k\in\Omega_{m,j}}y^{(k)} (9)

Equation 9 involves γj\gamma_{j} within a nonlinear logistic function, making it challenging to isolate γj\gamma_{j} and solve explicitly. This motivates the need for an approximation technique, such as a second-order Taylor expansion.

3.2 Trial 2: Approximate the loss function by Taylor’s series

Taylor’s expansion approximates a function f(x)f(x) based on a fixed x0x_{0} as follows [Canuto and Tabacco, 2015].

f(x)f(x0)+(xx0)f(x0)+12(xx0)2f′′(x0)f(x)\approx f(x_{0})+(x-x_{0})f^{\prime}(x_{0})+\frac{1}{2}(x-x_{0})^{2}f^{\prime\prime}(x_{0}) (10)

We approximate Equation 8, f(γj)f(\gamma_{j}), at x0=0x_{0}=0 based on the Taylor expansion:

f(γj)f(0)+(γj0)γj|γj=0+12(γj0)22γj2|γj=0=f(0)+γjkΩm,j(y(k)+eFm1(𝒙(k))1+eFm1(𝒙(k)))+γj22kΩm,j(σ(Fm1(𝒙(k)))(1σ(Fm1(𝒙(k)))))=f(0)+γjkΩm,j(y(k)+pm1(k))+γj22kΩm,j(pm1(k)(1pm1(k)))\begin{split}f(\gamma_{j})&\approx f(0)+(\gamma_{j}-0)\frac{\partial\mathcal{L}}{\partial\gamma_{j}}\bigg{|}_{\gamma_{j}=0}+\frac{1}{2}(\gamma_{j}-0)^{2}\frac{\partial^{2}\mathcal{L}}{\partial\gamma_{j}^{2}}\bigg{|}_{\gamma_{j}=0}\\ &=f(0)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+\frac{e^{F_{m-1}(\bm{x}^{(k)})}}{1+e^{F_{m-1}(\bm{x}^{(k)})}}\right)\\ &+\frac{\gamma_{j}^{2}}{2}\sum_{\forall k\in\Omega_{m,j}}\left(\sigma(F_{m-1}(\bm{x}^{(k)}))(1-\sigma(F_{m-1}(\bm{x}^{(k)})))\right)\\ &=f(0)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+p_{m-1}^{(k)}\right)+\frac{\gamma_{j}^{2}}{2}\sum_{\forall k\in\Omega_{m,j}}\left(p^{(k)}_{m-1}(1-p^{(k)}_{m-1})\right)\end{split} (11)

Take the derivative of Equation 11 with respect to γj\gamma_{j} and set it to zero:

f(γj)γjkΩm,j(y(k)+pm1(k))+γjkΩm,j(pm1(k)(1pm1(k))):=0\begin{split}\frac{\partial f(\gamma_{j})}{\partial\gamma_{j}}&\approx\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+p_{m-1}^{(k)}\right)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(p_{m-1}^{(k)}(1-p_{m-1}^{(k)})\right):=0\end{split}
γj=kΩm,j(y(k)pm1(k))kΩm,jpm1(k)(1pm1(k))=kΩm,jrm(k)kΩm,jpm1(k)(1pm1(k))\Rightarrow\gamma_{j}=\frac{\sum_{\forall k\in\Omega_{m,j}}\left(y^{(k)}-p_{m-1}^{(k)}\right)}{\sum_{\forall k\in\Omega_{m,j}}p_{m-1}^{(k)}\left(1-p_{m-1}^{(k)}\right)}=\frac{\sum_{\forall k\in\Omega_{m,j}}r^{(k)}_{m}}{\sum_{\forall k\in\Omega_{m,j}}p_{m-1}^{(k)}\left(1-p_{m-1}^{(k)}\right)} (12)

4 Summary

This document explores the inner workings of the Gradient Boosting Classifier (GBC), a special case of the Gradient Boosting Machine (GBM) specifically designed for binary classification tasks. GBC builds a model by iteratively adding weak learners (decision trees) that predict the residual errors of previous iterations. Unlike GBM used for regression, GBC involves predicting probabilities for binary outcomes and, as such, each terminal node in a tree outputs values determined by the residual errors and prior probabilities from earlier iterations.

We introduce the central equation that determines γj\gamma_{j}, the output value of a terminal node. The derivation of the terminal node value γj\gamma_{j} is explored in detail, demonstrating how it minimizes the logistic loss function. We tried to solve the optimal γj\gamma_{j} analytically by setting the derivative of the loss function to zero, though this appeared to be intractable. Consequently, we used a Taylor series approximation to derive a more tractable expression for γj\gamma_{j}, revealing how it is connected to the residuals and prior predictions of the model.

In conclusion, the document provides a comprehensive overview of the Gradient Boosting Classifier, covering both the theoretical aspects and practical implementation steps while highlighting the role of the terminal node values γj\gamma_{j} in optimizing the model’s performance in binary classification tasks.

References

  • [Canuto and Tabacco, 2015] Canuto, C. and Tabacco, A. (2015). Mathematical Analysis I, volume 85. Springer.
  • [Friedman, 2001] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.
  • [Han et al., 2012] Han, J., Kamber, M., and Pei, J. (2012). Data mining concepts and techniques, third edition.
  • [Hastie et al., 2009] Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.

Appendix A Appendix: a working example

Table 1: Training instances
index ii feature 𝒙(i)\bm{x}^{(i)} label y(i)y^{(i)}
1 1.3 1
2 1.5 0
3 3.0 1
4 4.0 0
5 6.5 1
6 8.4 0
Table 2: The residuals for iteration 1
index ii 𝒙(i)\bm{x}^{(i)} y(i)y^{(i)} p0(i)p_{0}^{(i)} r1(i)r_{1}^{(i)}
1 1.3 1 0.5 0.5
2 1.5 0 0.5 0.5-0.5
3 3.0 1 0.5 0.5
4 4.0 0 0.5 0.5-0.5
5 6.5 1 0.5 0.5
6 8.4 0 0.5 0.5-0.5

We use a simple example to illustrate the training and prediction of GBC. The training dataset is given in Table 1. We will train GBC in 3 iterations.

A.1 Training iteration 0: initialization

GBC first initializes all F0(i)F_{0}^{(i)} as zero, so the predicted p0(i)p_{0}^{(i)} are all σ(F0(i))=0.5\sigma(F_{0}^{(i)})=0.5 (referring to lines 1 through 3 in Algorithm 1).

A.2 Training iteration 1

Next, we enter iteration 1. GBC computes the residuals r1(i)r_{1}^{(i)} for each instance (lines 5 and 6 of Algorithm 1). The result is shown in Table 2.

GBC builds a one-level decision tree (a.k.a., a decision stump) T1T_{1} to fit the residuals. Assuming the optimal split is at 𝒙=3.5\bm{x}=3.5, we have a tree shown in Figure 1. The numbers inside the terminal nodes indicate the instance index. In the example, instances 1, 2, and 3 are located in the first child of T1T_{1} since 𝒙(i)3.5\bm{x}^{(i)}\leq 3.5 when i{1,2,3}i\in\{1,2,3\}, and instances 4, 5, 6 are in the second child of T1T_{1}. We use Ωm,j\Omega_{m,j} to denote the set of instances in the TmT_{m}’s jjth terminal node. Thus, Ω1,1={1,2,3}\Omega_{1,1}=\{1,2,3\} and Ω1,2={4,5,6}\Omega_{1,2}=\{4,5,6\}.

𝒙(i)3.5\bm{x}^{(i)}\leq 3.5?1, 2, 3Y4, 5, 6N
Figure 1: Tree T1T_{1}

For each terminal node jj, the output value γj\gamma_{j} is calculated by Equation 1. In our example:

γ1=kΩ1,1r1(i)/kΩ1,1(p0(i)(1p0(i)))=2/3γ2=kΩ1,2r1(i)/kΩ1,2(p0(i)(1p0(i)))=2/3\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{1,1}}r_{1}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{1,1}}\left(p_{0}^{(i)}(1-p_{0}^{(i)})\right)=2/3\\ \gamma_{2}=\sum_{\forall k\in\Omega_{1,2}}r_{1}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{1,2}}\left(p_{0}^{(i)}(1-p_{0}^{(i)})\right)=-2/3\end{split} (13)

The predicted logarithmic odds are updated by Equation 2:

F1(1)=F0(1)+ηγ1,1=0+0.1×23=230F1(2)=F0(2)+ηγ1,1=0+0.1×23=230F1(3)=F0(3)+ηγ1,1=0+0.1×23=230F1(4)=F0(4)+ηγ1,2=0+0.1×23=230F1(5)=F0(5)+ηγ1,2=0+0.1×23=230F1(6)=F0(6)+ηγ1,3=0+0.1×23=230\begin{split}F_{1}^{(1)}&=F_{0}^{(1)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(2)}&=F_{0}^{(2)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(3)}&=F_{0}^{(3)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(4)}&=F_{0}^{(4)}+\eta\gamma_{1,2}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\\ F_{1}^{(5)}&=F_{0}^{(5)}+\eta\gamma_{1,2}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\\ F_{1}^{(6)}&=F_{0}^{(6)}+\eta\gamma_{1,3}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\end{split} (14)

Finally, we update the probabilities.

p1(1)=σ(F1(1))=0.5167p1(2)=σ(F1(2))=0.5167p1(3)=σ(F1(3))=0.5167p1(4)=σ(F1(4))=0.4833p1(5)=σ(F1(5))=0.4833p1(6)=σ(F1(6))=0.4833\begin{split}p_{1}^{(1)}&=\sigma\left(F_{1}^{(1)}\right)=0.5167\\ p_{1}^{(2)}&=\sigma\left(F_{1}^{(2)}\right)=0.5167\\ p_{1}^{(3)}&=\sigma\left(F_{1}^{(3)}\right)=0.5167\\ p_{1}^{(4)}&=\sigma\left(F_{1}^{(4)}\right)=0.4833\\ p_{1}^{(5)}&=\sigma\left(F_{1}^{(5)}\right)=0.4833\\ p_{1}^{(6)}&=\sigma\left(F_{1}^{(6)}\right)=0.4833\end{split} (15)

A.3 Training iteration 2

Table 3: The residuals for iteration 2
index ii 𝒙(i)\bm{x}^{(i)} y(i)y^{(i)} p1(i)p_{1}^{(i)} r2(i)r_{2}^{(i)}
1 1.3 1 0.5167 0.4833
2 1.5 0 0.5167 0.5167-0.5167
3 3.0 1 0.5167 0.4833
4 4.0 0 0.4833 0.4833-0.4833
5 6.5 1 0.4833 0.5167
6 8.4 0 0.4833 0.4833-0.4833

We enter iteration 2. GBC computes the residuals r2(i)=y(i)p1(i)r_{2}^{(i)}=y^{(i)}-p_{1}^{(i)} for each instance in this iteration. The result is shown in Table 3.

𝒙(i)2.25\bm{x}^{(i)}\leq 2.25?1, 2Y3, 4, 5, 6N
Figure 2: Tree T2T_{2}

GBC builds another one-level decision tree T2T_{2} to fit the residuals. Assuming the optimal split is at 𝒙=2.25\bm{x}=2.25, then we have a tree shown in Figure 2: training instances 1 and 2 are located in the first child of T2T_{2}, and instances 3, 4, 5, 6 are in the second child of T2T_{2}, i.e., Ω2,1={1,2}\Omega_{2,1}=\{1,2\} and Ω2,2={3,4,5,6}\Omega_{2,2}=\{3,4,5,6\}.

For each terminal node jj, the output value γj\gamma_{j} is calculated by Equation 1. So:

γ1=kΩ2,1r2(i)/kΩ2,1(p1(i)(1p1(i)))0.0669γ2=kΩ2,2r2(i)/kΩ2,2(p1(i)(1p1(i)))0.0334\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{2,1}}r_{2}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{2,1}}\left(p_{1}^{(i)}(1-p_{1}^{(i)})\right)\approx-0.0669\\ \gamma_{2}=\sum_{\forall k\in\Omega_{2,2}}r_{2}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{2,2}}\left(p_{1}^{(i)}(1-p_{1}^{(i)})\right)\approx 0.0334\end{split} (16)

Next, we can update the model predicted log odds F2(i)F_{2}^{(i)} based on Equation 2:

F2(1)=F1(1)+ηγ2,1=230+0.1×(0.0669)0.0600F2(2)=F1(2)+ηγ2,1=230+0.1×(0.0669)0.0600F2(3)=F1(3)+ηγ2,2=230+0.1×0.03340.0700F2(4)=F1(4)+ηγ2,2=230+0.1×0.03340.0633F2(5)=F1(5)+ηγ2,2=230+0.1×0.03340.0633F2(6)=F1(6)+ηγ2,3=230+0.1×0.03340.0633\begin{split}F_{2}^{(1)}&=F_{1}^{(1)}+\eta\gamma_{2,1}=\frac{2}{30}+0.1\times(-0.0669)\approx 0.0600\\ F_{2}^{(2)}&=F_{1}^{(2)}+\eta\gamma_{2,1}=\frac{2}{30}+0.1\times(-0.0669)\approx 0.0600\\ F_{2}^{(3)}&=F_{1}^{(3)}+\eta\gamma_{2,2}=\frac{2}{30}+0.1\times 0.0334\approx 0.0700\\ F_{2}^{(4)}&=F_{1}^{(4)}+\eta\gamma_{2,2}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\\ F_{2}^{(5)}&=F_{1}^{(5)}+\eta\gamma_{2,2}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\\ F_{2}^{(6)}&=F_{1}^{(6)}+\eta\gamma_{2,3}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\end{split} (17)

Finally, we update the probabilities.

p2(1)=σ(F2(1))0.5150p2(2)=σ(F2(2))0.5150p2(3)=σ(F2(3))0.5175p2(4)=σ(F2(4))0.4842p2(5)=σ(F2(5))0.4842p2(6)=σ(F2(6))0.4842\begin{split}p_{2}^{(1)}&=\sigma\left(F_{2}^{(1)}\right)\approx 0.5150\\ p_{2}^{(2)}&=\sigma\left(F_{2}^{(2)}\right)\approx 0.5150\\ p_{2}^{(3)}&=\sigma\left(F_{2}^{(3)}\right)\approx 0.5175\\ p_{2}^{(4)}&=\sigma\left(F_{2}^{(4)}\right)\approx 0.4842\\ p_{2}^{(5)}&=\sigma\left(F_{2}^{(5)}\right)\approx 0.4842\\ p_{2}^{(6)}&=\sigma\left(F_{2}^{(6)}\right)\approx 0.4842\end{split} (18)

A.4 Training iteration 3

Table 4: The residuals for iteration 3
index ii 𝒙(i)\bm{x}^{(i)} y(i)y^{(i)} p2(i)p_{2}^{(i)} r3(i)r_{3}^{(i)}
1 1.3 1 0.5150 0.4850
2 1.5 0 0.5150 0.5150-0.5150
3 3.0 1 0.5175 0.4825
4 4.0 0 0.4842 0.4842-0.4842
5 6.5 1 0.4842 0.5158
6 8.4 0 0.4842 0.4842-0.4842

We enter iteration 3. GBC computes the residuals r3(i)=y(i)p2(i)r_{3}^{(i)}=y^{(i)}-p_{2}^{(i)} for each instance in this iteration. The result is shown in Table 4.

𝒙(i)5.25\bm{x}^{(i)}\leq 5.25?1, 2, 3, 4Y5, 6N
Figure 3: Tree T3T_{3}

GBC builds yet another decision stump T3T_{3} to fit the residuals. Assuming the optimal split is at 𝒙=5.25\bm{x}=5.25, then we have a tree shown in Figure 3: training instances 1, 2, 3, and 4 are located in the first child of T3T_{3}, and instances 5 and 6 are in the second child of T3T_{3}, i.e., Ω3,1={1,2,3,4}\Omega_{3,1}=\{1,2,3,4\} and Ω3,2={5,6}\Omega_{3,2}=\{5,6\}.

For each terminal node jj, the output value γj\gamma_{j} is calculated by Equation 1. So:

γ1=kΩ3,1r3(i)/kΩ3,1(p2(i)(1p2(i)))0.0317γ2=kΩ3,2r3(i)/kΩ3,2(p2(i)(1p2(i)))0.0633\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{3,1}}r_{3}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{3,1}}\left(p_{2}^{(i)}(1-p_{2}^{(i)})\right)\approx-0.0317\\ \gamma_{2}=\sum_{\forall k\in\Omega_{3,2}}r_{3}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{3,2}}\left(p_{2}^{(i)}(1-p_{2}^{(i)})\right)\approx 0.0633\end{split} (19)

Next, we can update the model predicted log odds F3(i)F_{3}^{(i)} based on Equation 2:

F3(1)=F2(1)+ηγ3,10.06+0.1×(0.0317)0.0568F3(2)=F2(2)+ηγ3,10.06+0.1×(0.0317)0.0568F3(3)=F2(3)+ηγ3,10.07+0.1×(0.0317)0.0668F3(4)=F2(4)+ηγ3,1(0.0633)+0.1×(0.0317)0.0665F3(5)=F2(5)+ηγ3,2(0.0633)+0.1×0.06330.0570F3(6)=F2(6)+ηγ3,3(0.0633)+0.1×0.06330.0570\begin{split}F_{3}^{(1)}&=F_{2}^{(1)}+\eta\gamma_{3,1}\approx 0.06+0.1\times(-0.0317)\approx 0.0568\\ F_{3}^{(2)}&=F_{2}^{(2)}+\eta\gamma_{3,1}\approx 0.06+0.1\times(-0.0317)\approx 0.0568\\ F_{3}^{(3)}&=F_{2}^{(3)}+\eta\gamma_{3,1}\approx 0.07+0.1\times(-0.0317)\approx 0.0668\\ F_{3}^{(4)}&=F_{2}^{(4)}+\eta\gamma_{3,1}\approx(-0.0633)+0.1\times(-0.0317)\approx-0.0665\\ F_{3}^{(5)}&=F_{2}^{(5)}+\eta\gamma_{3,2}\approx(-0.0633)+0.1\times 0.0633\approx-0.0570\\ F_{3}^{(6)}&=F_{2}^{(6)}+\eta\gamma_{3,3}\approx(-0.0633)+0.1\times 0.0633\approx-0.0570\end{split} (20)

Finally, we update the probabilities.

p3(1)=σ(F3(1))0.5142p3(2)=σ(F3(2))0.5142p3(3)=σ(F3(3))0.5167p3(4)=σ(F3(4))0.4834p3(5)=σ(F3(5))0.4858p3(6)=σ(F3(6))0.4858\begin{split}p_{3}^{(1)}&=\sigma\left(F_{3}^{(1)}\right)\approx 0.5142\\ p_{3}^{(2)}&=\sigma\left(F_{3}^{(2)}\right)\approx 0.5142\\ p_{3}^{(3)}&=\sigma\left(F_{3}^{(3)}\right)\approx 0.5167\\ p_{3}^{(4)}&=\sigma\left(F_{3}^{(4)}\right)\approx 0.4834\\ p_{3}^{(5)}&=\sigma\left(F_{3}^{(5)}\right)\approx 0.4858\\ p_{3}^{(6)}&=\sigma\left(F_{3}^{(6)}\right)\approx 0.4858\end{split} (21)

A.5 Prediction

Eventually, the predicted log odds can be shown in Equation 22, where the terminal nodes of a tree show the predicted log odds of that tree.

y^=0+η 𝒙(i)3.5?2/3Y-2/3N +η 𝒙(i)2.25?-0.0669Y0.0334N +η 𝒙(i)5.25?-0.0317Y0.0633N \hat{y}=0+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to93.72pt{\vbox to77.24pt{\pgfpicture\makeatletter\hbox{\hskip 46.17732pt\lower-54.65573pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{22.38574pt}{0.0pt}\pgfsys@curveto{22.38574pt}{12.36345pt}{12.36345pt}{22.38574pt}{0.0pt}{22.38574pt}\pgfsys@curveto{-12.36345pt}{22.38574pt}{-22.38574pt}{12.36345pt}{-22.38574pt}{0.0pt}\pgfsys@curveto{-22.38574pt}{-12.36345pt}{-12.36345pt}{-22.38574pt}{0.0pt}{-22.38574pt}\pgfsys@curveto{12.36345pt}{-22.38574pt}{22.38574pt}{-12.36345pt}{22.38574pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-20.86115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 3.5$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-25.15457pt}{-42.67912pt}\pgfsys@curveto{-25.15457pt}{-36.92902pt}{-29.81584pt}{-32.26775pt}{-35.56595pt}{-32.26775pt}\pgfsys@curveto{-41.31606pt}{-32.26775pt}{-45.97733pt}{-36.92902pt}{-45.97733pt}{-42.67912pt}\pgfsys@curveto{-45.97733pt}{-48.42923pt}{-41.31606pt}{-53.0905pt}{-35.56595pt}{-53.0905pt}\pgfsys@curveto{-29.81584pt}{-53.0905pt}{-25.15457pt}{-48.42923pt}{-25.15457pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-43.06596pt}{-45.17912pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2/3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-14.45898pt}{-17.35078pt}\pgfsys@lineto{-28.3182pt}{-33.98184pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-28.31819pt}{-33.98183pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-30.51585pt}{-29.35568pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{47.34256pt}{-42.67912pt}\pgfsys@curveto{47.34256pt}{-36.175pt}{42.07007pt}{-30.90251pt}{35.56595pt}{-30.90251pt}\pgfsys@curveto{29.06183pt}{-30.90251pt}{23.78934pt}{-36.175pt}{23.78934pt}{-42.67912pt}\pgfsys@curveto{23.78934pt}{-49.18324pt}{29.06183pt}{-54.45573pt}{35.56595pt}{-54.45573pt}\pgfsys@curveto{42.07007pt}{-54.45573pt}{47.34256pt}{-49.18324pt}{47.34256pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{26.39926pt}{-45.17912pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-2/3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{14.45898pt}{-17.35078pt}\pgfsys@lineto{27.44421pt}{-32.93304pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{27.4442pt}{-32.93303pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.57886pt}{-28.83128pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} }+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to103.98pt{\vbox to84.95pt{\pgfpicture\makeatletter\hbox{\hskip 52.78694pt\lower-59.90012pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{24.84619pt}{0.0pt}\pgfsys@curveto{24.84619pt}{13.72234pt}{13.72234pt}{24.84619pt}{0.0pt}{24.84619pt}\pgfsys@curveto{-13.72234pt}{24.84619pt}{-24.84619pt}{13.72234pt}{-24.84619pt}{0.0pt}\pgfsys@curveto{-24.84619pt}{-13.72234pt}{-13.72234pt}{-24.84619pt}{0.0pt}{-24.84619pt}\pgfsys@curveto{13.72234pt}{-24.84619pt}{24.84619pt}{-13.72234pt}{24.84619pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-23.36115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 2.25$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-18.54495pt}{-42.67912pt}\pgfsys@curveto{-18.54495pt}{-33.27858pt}{-26.1654pt}{-25.65813pt}{-35.56595pt}{-25.65813pt}\pgfsys@curveto{-44.96649pt}{-25.65813pt}{-52.58694pt}{-33.27858pt}{-52.58694pt}{-42.67912pt}\pgfsys@curveto{-52.58694pt}{-52.07967pt}{-44.96649pt}{-59.70012pt}{-35.56595pt}{-59.70012pt}\pgfsys@curveto{-26.1654pt}{-59.70012pt}{-18.54495pt}{-52.07967pt}{-18.54495pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-51.12154pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-0.0669}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-16.03412pt}{-19.24095pt}\pgfsys@lineto{-24.08685pt}{-28.9042pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-24.08684pt}{-28.90419pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-29.18774pt}{-27.76195pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{50.99734pt}{-42.67912pt}\pgfsys@curveto{50.99734pt}{-34.1565pt}{44.08858pt}{-27.24773pt}{35.56595pt}{-27.24773pt}\pgfsys@curveto{27.04332pt}{-27.24773pt}{20.13455pt}{-34.1565pt}{20.13455pt}{-42.67912pt}\pgfsys@curveto{20.13455pt}{-51.20175pt}{27.04332pt}{-58.11052pt}{35.56595pt}{-58.11052pt}\pgfsys@curveto{44.08858pt}{-58.11052pt}{50.99734pt}{-51.20175pt}{50.99734pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{21.67702pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{0.0334}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{16.03412pt}{-19.24095pt}\pgfsys@lineto{25.10448pt}{-30.12537pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{25.10446pt}{-30.12535pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.19655pt}{-28.37253pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} }+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to103.98pt{\vbox to84.95pt{\pgfpicture\makeatletter\hbox{\hskip 52.78694pt\lower-59.90012pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{24.84619pt}{0.0pt}\pgfsys@curveto{24.84619pt}{13.72234pt}{13.72234pt}{24.84619pt}{0.0pt}{24.84619pt}\pgfsys@curveto{-13.72234pt}{24.84619pt}{-24.84619pt}{13.72234pt}{-24.84619pt}{0.0pt}\pgfsys@curveto{-24.84619pt}{-13.72234pt}{-13.72234pt}{-24.84619pt}{0.0pt}{-24.84619pt}\pgfsys@curveto{13.72234pt}{-24.84619pt}{24.84619pt}{-13.72234pt}{24.84619pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-23.36115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 5.25$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-18.54495pt}{-42.67912pt}\pgfsys@curveto{-18.54495pt}{-33.27858pt}{-26.1654pt}{-25.65813pt}{-35.56595pt}{-25.65813pt}\pgfsys@curveto{-44.96649pt}{-25.65813pt}{-52.58694pt}{-33.27858pt}{-52.58694pt}{-42.67912pt}\pgfsys@curveto{-52.58694pt}{-52.07967pt}{-44.96649pt}{-59.70012pt}{-35.56595pt}{-59.70012pt}\pgfsys@curveto{-26.1654pt}{-59.70012pt}{-18.54495pt}{-52.07967pt}{-18.54495pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-51.12154pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-0.0317}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-16.03412pt}{-19.24095pt}\pgfsys@lineto{-24.08685pt}{-28.9042pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-24.08684pt}{-28.90419pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-29.18774pt}{-27.76195pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{50.99734pt}{-42.67912pt}\pgfsys@curveto{50.99734pt}{-34.1565pt}{44.08858pt}{-27.24773pt}{35.56595pt}{-27.24773pt}\pgfsys@curveto{27.04332pt}{-27.24773pt}{20.13455pt}{-34.1565pt}{20.13455pt}{-42.67912pt}\pgfsys@curveto{20.13455pt}{-51.20175pt}{27.04332pt}{-58.11052pt}{35.56595pt}{-58.11052pt}\pgfsys@curveto{44.08858pt}{-58.11052pt}{50.99734pt}{-51.20175pt}{50.99734pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{21.67702pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{0.0633}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{16.03412pt}{-19.24095pt}\pgfsys@lineto{25.10448pt}{-30.12537pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{25.10446pt}{-30.12535pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.19655pt}{-28.37253pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} } (22)

If given a new test instance with 𝒙=7\bm{x}=7, the predicted log odds is

y^=0+0.1×(23)+0.1×0.0334+0.1×0.06330.0570\hat{y}=0+0.1\times\left(-\frac{2}{3}\right)+0.1\times 0.0334+0.1\times 0.0633\approx-0.0570 (23)

The predicted probability that y=1y=1 is

p=σ(0.0570)=0.4858p=\sigma\left(-0.0570\right)=0.4858 (24)

If we set the threshold at 0.50.5, the GBC predicts this instance as negative.