Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $\gamma_{j}$

Hung-Hsuan Chen

(Department of Computer Science and Information Engineering
National Central University
hhchen1105@acm.org )

Abstract

The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC’s training and prediction processes, focusing on the computation of terminal node values $\gamma_{j}$ , which are crucial to optimizing the logistic loss function. We derive $\gamma_{j}$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm’s implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.

1 Introduction

The gradient boosting machine (GBM) [Friedman, 2001, Hastie et al., 2009] is a robust predictive model for tabular datasets. GBM constructs weak learners (trees) sequentially; each tree predicts the residual of the earlier predictions.

When applying GBM to regression tasks, the training process of each tree is very similar to that of a standard decision tree. However, when applying GBM on binary classification tasks (we call the Gradient Boosting Classifier method, or GBC, below), the prediction of a terminal node $j$ of a tree $T_{m}$ is not the average residual of the training instances that fall at node $j$ . Instead, the prediction of the terminal node $j$ in the tree $T_{m}$ is given by a mysterious equation below.

\gamma_{j}=\frac{\sum_{i\in\Omega_{m,j}}r_{m}^{(i)}}{\sum_{i\in\Omega_{m,j}}p_{m-1}^{(i)}(1-p_{m-1}^{(i)})},

(1)

where $p_{m-1}^{(i)}$ is the estimated probability that the $i$ th instance’s target value is 1 in the previous iteration, $r_{m}^{(i)}=y^{(i)}-p_{m-1}^{(i)}$ is the residual of instance $i$ that needs to be fit/predict in iteration $m$ , and $\Omega_{m,j}$ is the set of instance indices located in the terminal node $j$ for tree $T_{m}$ .

In the next section, we introduce the GBC training and prediction procedure, followed by a clear explanation of how $\gamma_{j}$ is derived in Section 3. We give a step-by-step example in the appendix in Section A.

2 GBC training and prediction

input : Input features

\bm{x}^{(1:n)}\in\mathbb{R}^{n\times d}

and targets

y^{(1:n)}\in\{0,1\}^{n}

input : Number of Trees:

M

input : Learning rate:

\eta

output : Fitted trees:

T_{1:M}

// Initialization

2 for $i\leftarrow 1$ to $n$ do

F_{0}^{(i)}\leftarrow 0

;

p_{0}^{(i)}\leftarrow\sigma\left(F_{0}^{(i)}\right)=0.5

;

// Training for

M

iterations

7 for $m\leftarrow 1$ to $M$ do

8 for $i\leftarrow 1$ to $n$ do

9 Compute residual:

r_{m}^{{(i)}}\leftarrow y^{(i)}-p_{m-1}^{(i)}

;

12 Train a regression tree

T_{m}

to fit the dataset:

\{(\bm{x}^{(i)},r_{m}^{(i)})\}_{i=1,\ldots,n}

;

// Update terminal node values

14 Let

\Omega_{m,j}

be the set of instance indices in terminal node

j

T_{m}

;

15 for each terminal node $j\in T_{m}$ do

16 Set the output of node

j

T_{m}

\gamma_{j}\leftarrow\frac{\sum_{k\in\Omega_{m,j}}r_{m}^{(k)}}{\sum_{k\in\Omega_{m,j}}p_{m-1}^{(k)}(1-p_{m-1}^{(k)})}

;

17 for $k\in\Omega_{m,j}$ do

F_{m}^{(k)}\leftarrow F_{m-1}^{(k)}+\eta\gamma_{j}

;

p_{m}^{(k)}\leftarrow\sigma\left(F_{m}^{(k)}\right)

;

Algorithm 1 Gradient Boosting Classifier Training

2.1 Training process

The training of a Gradient Boosting Classifier begins with an initial model, which is typically a constant prediction for all instances. In binary classification, the model is initialized with a raw predicted logarithm of odds (called log odds below) $F_{0}^{(i)}$ set to 0 for each instance $i$ , which corresponds to an initial predicted probability of 0.5 for all instances (since $p_{0}^{(i)}=\sigma(F_{0}^{(i)})$ , where $\sigma$ is the logistic function). This represents a neutral starting point since the model does not have prior information about the data.

The algorithm iteratively proceeds over $M$ boosting iterations. At each iteration $m$ , the model computes the residuals for each training instance. The residuals represent how far the current predicted probability is from the true label $y^{(i)}$ . Specifically, the residual $r_{m}^{(i)}$ is computed as the difference between the true label $y^{(i)}$ and the predicted probability $p_{m-1}^{(i)}$ from the previous iteration. These residuals are used to train the decision tree $T_{m}$ , where the tree is fitted to predict the residuals, thus focusing the model on correcting the errors made by the previous trees. The tree training strategy could be based on metrics such as information gain, Gini-index, or mean squared error, guiding the tree to select the optimal feature and threshold for splitting [Han et al., 2012].

Once the tree $T_{m}$ is trained, the algorithm computes the optimal values for the terminal nodes. For each terminal node $j$ , the output value $\gamma_{j}$ is calculated by minimizing the logistic loss for the instances that fall into that node. This is done using Equation 1; the reasons will be explained in Section 3.

Next, for each instance $k$ that falls into terminal node $j$ , the model prediction is updated by adding the scaled node value $\gamma_{j}$ to the previous prediction $F_{m-1}^{(k)}$ , with a scaling factor given by the learning rate $\eta$ . This update is applied as follows:

F_{m}^{(k)}\leftarrow F_{m-1}^{(k)}+\eta\gamma_{j}

(2)

The predicted probability of instance $k$ is then updated by applying the logistic function to the new raw score $F_{m}^{(k)}$ :

p_{m}^{(k)}=\sigma(F_{m}^{(k)}).

(3)

The algorithm builds trees over $M$ iterations, incrementally refining the model’s predictions by reducing residual errors from previous iterations. At the end of the training process, we have an ensemble of $M$ decision trees $T_{1:M}$ , each contributing to the final prediction.

Algorithm 1 shows the pseudocode for training.

2.2 Prediction process

input : Test instances

\bm{x}^{(1:t)}\in\mathbb{R}^{t\times d}

input : Learning rate:

\eta

input : Trained trees:

T_{1:M}

output : Predicted probabilities

p^{(1:t)}

for class 1

1 for $i\leftarrow 1$ to $t$ do

F^{(i)}\leftarrow 0

;

3 for $m\leftarrow 1$ to $M$ do

F^{(i)}\leftarrow F^{(i)}+\eta T_{m}(\bm{x}^{(i)})

;

T_{m}(\bm{x}^{(i)})

T_{m}

’s prediction on

\bm{x}^{(i)}

p^{(i)}\leftarrow\sigma\left(F^{(i)}\right)=\frac{1}{1+e^{-F^{(i)}}}

;

Algorithm 2 Gradient Boosting Classifier Prediction

Once the Gradient Boosting Classifier is trained, GBC predicts the labels of new test instances as follows: The prediction process starts by initializing the raw prediction $F^{(i)}$ for each test instance $i$ to 0, just as in the training phase. The algorithm then iteratively applies each of the $M$ trees to the test instance, adding the scaled output of each tree to the current raw score.

Specifically, for each tree $T_{m}$ , the prediction for a test instance $\bm{x}^{(i)}$ is obtained by evaluating the decision tree. The output value, $T_{m}(\bm{x}^{(i)})$ , is then scaled by the learning rate $\eta$ and added to the current raw score $F^{(i)}$ as:

F^{(i)}=F^{(i)}+\eta T_{m}(\bm{x}^{(i)})

(4)

This process is repeated for each tree from $T_{1}$ to $T_{M}$ , accumulating the contributions of all trees to the final raw prediction $F_{i}$ .

Once all trees have been evaluated, the final predicted probability for the test instance $\bm{x}^{(i)}$ is computed by applying the logistic function to the raw score $F^{(i)}$ :

p^{(i)}=\sigma(F^{(i)})=\frac{1}{1+e^{-F^{(i)}}}

(5)

The output is a probability score for each test instance, indicating the confidence of the model in predicting class 1. This probability can be thresholded (e.g., using a threshold of 0.5) to produce binary class predictions.

Much like the training process, this prediction process is designed to leverage the additive nature of Gradient Boosting, where the final prediction is built up from the contributions of many weak learners.

The pseudocode is shown in Algorithm 2.

3 Where does $\gamma_{j}$ come from?

Assume that GBC has been trained for $m-1$ iterations, so $F_{m-1}(\bm{x}^{(i)})$ is fixed. GBC attempts to train a new weak learner $T_{m}$ that has $J$ terminal node predictions $\gamma_{1},\ldots,\gamma_{J}$ . The new predicted logarithm of odds $F_{m}(\bm{x}^{(i)})$ is given by

F_{m}(\bm{x}^{(i)})=F_{m-1}(\bm{x}^{(i)})+\gamma_{j},

(6)

where $j$ is the index of the terminal node that $\bm{x}^{(i)}$ falls in.

The predicted probability is

p_{m}^{(i)}=\sigma\left(F_{m}(\bm{x}^{(i)})\right)=\frac{1}{1+e^{-F_{m}(\bm{x}^{(i)})}}=\frac{1}{1+e^{-\left(F_{m-1}(\bm{x}^{(i)})+\gamma_{j}\right)}}

(7)

Let $\bm{y}^{\Omega_{m,j}}$ and $\bm{p}_{m}^{\Omega_{m,j}}$ denote the ground truth labels and predicted probabilities after iteration $m$ for all the nodes that fall in the terminal node $j$ of the tree $T_{m}$ , we express the cross entropy loss for these instances as a function of $\gamma_{j}$ :

\begin{split}f(\gamma_{j})&=\mathcal{L}(\bm{y}^{\Omega_{m,j}},\bm{p}_{m}^{\Omega_{m,j}})=-\sum_{\forall k\in\Omega_{m,j}}\left(y^{(k)}\log p_{m}^{(k)}+(1-y^{(k)})\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log p_{m}^{(k)}-\log(1-p_{m}^{(k)})+y^{(k)}\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log\frac{p_{m}^{(k)}}{1-p_{m}^{(k)}}-\log(1-p_{m}^{(k)})\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\log\left(\frac{1}{e^{-F_{m}(\bm{x}^{(i)})}}\right)-\log\left(1-\frac{1}{1+e^{-F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})-\log\left(\frac{e^{-F_{m}(\bm{x}^{(k)})}}{1+e^{-F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})-\log\left(\frac{1}{1+e^{F_{m}(\bm{x}^{(k)})}}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}F_{m}(\bm{x}^{(i)})+\log\left(1+e^{F_{m}(\bm{x}^{(k)})}\right)\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\left(F_{m-1}(\bm{x}^{(i)})+\gamma_{j}\right)+\log\left(1+e^{\left(F_{m-1}(\bm{x}^{(k)})+\gamma_{j}\right)}\right)\right)\end{split}

(8)

We look for the $\gamma_{j}$ value to minimize the cross-entropy loss.

3.1 Trial 1: set the derivative to zero and solve the equation

The loss function can be considered a function of $\gamma_{j}$ since $\gamma_{j}$ is the only variable in the function. We take the derivative of the loss with respect to $\gamma_{j}$ and set it to zero:

\begin{split}\frac{f(\gamma_{j})}{\partial\gamma_{j}}&=\frac{\partial\mathcal{L}(\bm{y}^{\Omega_{m,j}},\bm{p}_{m}^{\Omega_{m,j}})}{\partial\gamma_{j}}=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+\frac{e^{F_{m-1}(\bm{x}^{(k)})+\gamma_{j}}}{1+e^{F_{m-1}(\bm{x}^{(k)})+\gamma_{j}}}\right)\\ &=\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}\right)+\sum_{\forall k\in\Omega_{m,j}}\left(\frac{1}{1+e^{-(F_{m-1}(\bm{x}^{(k)})+\gamma_{j})}}\right):=0\end{split}

\Rightarrow\sum_{\forall k\in\Omega_{m,j}}\left(\frac{1}{1+e^{-(F_{m-1}(\bm{x}^{(k)})+\gamma_{j})}}\right)=\sum_{\forall k\in\Omega_{m,j}}y^{(k)}

(9)

Equation 9 involves $\gamma_{j}$ within a nonlinear logistic function, making it challenging to isolate $\gamma_{j}$ and solve explicitly. This motivates the need for an approximation technique, such as a second-order Taylor expansion.

3.2 Trial 2: Approximate the loss function by Taylor’s series

Taylor’s expansion approximates a function $f(x)$ based on a fixed $x_{0}$ as follows [Canuto and Tabacco, 2015].

f(x)\approx f(x_{0})+(x-x_{0})f^{\prime}(x_{0})+\frac{1}{2}(x-x_{0})^{2}f^{\prime\prime}(x_{0})

(10)

We approximate Equation 8, $f(\gamma_{j})$ , at $x_{0}=0$ based on the Taylor expansion:

\begin{split}f(\gamma_{j})&\approx f(0)+(\gamma_{j}-0)\frac{\partial\mathcal{L}}{\partial\gamma_{j}}\bigg{|}_{\gamma_{j}=0}+\frac{1}{2}(\gamma_{j}-0)^{2}\frac{\partial^{2}\mathcal{L}}{\partial\gamma_{j}^{2}}\bigg{|}_{\gamma_{j}=0}\\ &=f(0)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+\frac{e^{F_{m-1}(\bm{x}^{(k)})}}{1+e^{F_{m-1}(\bm{x}^{(k)})}}\right)\\ &+\frac{\gamma_{j}^{2}}{2}\sum_{\forall k\in\Omega_{m,j}}\left(\sigma(F_{m-1}(\bm{x}^{(k)}))(1-\sigma(F_{m-1}(\bm{x}^{(k)})))\right)\\ &=f(0)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+p_{m-1}^{(k)}\right)+\frac{\gamma_{j}^{2}}{2}\sum_{\forall k\in\Omega_{m,j}}\left(p^{(k)}_{m-1}(1-p^{(k)}_{m-1})\right)\end{split}

(11)

Take the derivative of Equation 11 with respect to $\gamma_{j}$ and set it to zero:

\begin{split}\frac{\partial f(\gamma_{j})}{\partial\gamma_{j}}&\approx\sum_{\forall k\in\Omega_{m,j}}\left(-y^{(k)}+p_{m-1}^{(k)}\right)+\gamma_{j}\sum_{\forall k\in\Omega_{m,j}}\left(p_{m-1}^{(k)}(1-p_{m-1}^{(k)})\right):=0\end{split}

\Rightarrow\gamma_{j}=\frac{\sum_{\forall k\in\Omega_{m,j}}\left(y^{(k)}-p_{m-1}^{(k)}\right)}{\sum_{\forall k\in\Omega_{m,j}}p_{m-1}^{(k)}\left(1-p_{m-1}^{(k)}\right)}=\frac{\sum_{\forall k\in\Omega_{m,j}}r^{(k)}_{m}}{\sum_{\forall k\in\Omega_{m,j}}p_{m-1}^{(k)}\left(1-p_{m-1}^{(k)}\right)}

(12)

4 Summary

This document explores the inner workings of the Gradient Boosting Classifier (GBC), a special case of the Gradient Boosting Machine (GBM) specifically designed for binary classification tasks. GBC builds a model by iteratively adding weak learners (decision trees) that predict the residual errors of previous iterations. Unlike GBM used for regression, GBC involves predicting probabilities for binary outcomes and, as such, each terminal node in a tree outputs values determined by the residual errors and prior probabilities from earlier iterations.

We introduce the central equation that determines $\gamma_{j}$ , the output value of a terminal node. The derivation of the terminal node value $\gamma_{j}$ is explored in detail, demonstrating how it minimizes the logistic loss function. We tried to solve the optimal $\gamma_{j}$ analytically by setting the derivative of the loss function to zero, though this appeared to be intractable. Consequently, we used a Taylor series approximation to derive a more tractable expression for $\gamma_{j}$ , revealing how it is connected to the residuals and prior predictions of the model.

In conclusion, the document provides a comprehensive overview of the Gradient Boosting Classifier, covering both the theoretical aspects and practical implementation steps while highlighting the role of the terminal node values $\gamma_{j}$ in optimizing the model’s performance in binary classification tasks.

References

[Canuto and Tabacco, 2015] Canuto, C. and Tabacco, A. (2015). Mathematical Analysis I, volume 85. Springer.
[Friedman, 2001] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.
[Han et al., 2012] Han, J., Kamber, M., and Pei, J. (2012). Data mining concepts and techniques, third edition.
[Hastie et al., 2009] Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.

Appendix A Appendix: a working example

Table 1: Training instances

index $i$	feature $\bm{x}^{(i)}$	label $y^{(i)}$
1	1.3	1
2	1.5	0
3	3.0	1
4	4.0	0
5	6.5	1
6	8.4	0

Table 2: The residuals for iteration 1

index $i$	$\bm{x}^{(i)}$	$y^{(i)}$	$p_{0}^{(i)}$	$r_{1}^{(i)}$
1	1.3	1	0.5	0.5
2	1.5	0	0.5	$-0.5$
3	3.0	1	0.5	0.5
4	4.0	0	0.5	$-0.5$
5	6.5	1	0.5	0.5
6	8.4	0	0.5	$-0.5$

We use a simple example to illustrate the training and prediction of GBC. The training dataset is given in Table 1. We will train GBC in 3 iterations.

A.1 Training iteration 0: initialization

GBC first initializes all $F_{0}^{(i)}$ as zero, so the predicted $p_{0}^{(i)}$ are all $\sigma(F_{0}^{(i)})=0.5$ (referring to lines 1 through 3 in Algorithm 1).

A.2 Training iteration 1

Next, we enter iteration 1. GBC computes the residuals $r_{1}^{(i)}$ for each instance (lines 5 and 6 of Algorithm 1). The result is shown in Table 2.

GBC builds a one-level decision tree (a.k.a., a decision stump) $T_{1}$ to fit the residuals. Assuming the optimal split is at $\bm{x}=3.5$ , we have a tree shown in Figure 1. The numbers inside the terminal nodes indicate the instance index. In the example, instances 1, 2, and 3 are located in the first child of $T_{1}$ since $\bm{x}^{(i)}\leq 3.5$ when $i\in\{1,2,3\}$ , and instances 4, 5, 6 are in the second child of $T_{1}$ . We use $\Omega_{m,j}$ to denote the set of instances in the $T_{m}$ ’s $j$ th terminal node. Thus, $\Omega_{1,1}=\{1,2,3\}$ and $\Omega_{1,2}=\{4,5,6\}$ .

Figure 1: Tree

T_{1}

For each terminal node $j$ , the output value $\gamma_{j}$ is calculated by Equation 1. In our example:

\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{1,1}}r_{1}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{1,1}}\left(p_{0}^{(i)}(1-p_{0}^{(i)})\right)=2/3\\ \gamma_{2}=\sum_{\forall k\in\Omega_{1,2}}r_{1}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{1,2}}\left(p_{0}^{(i)}(1-p_{0}^{(i)})\right)=-2/3\end{split}

(13)

The predicted logarithmic odds are updated by Equation 2:

\begin{split}F_{1}^{(1)}&=F_{0}^{(1)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(2)}&=F_{0}^{(2)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(3)}&=F_{0}^{(3)}+\eta\gamma_{1,1}=0+0.1\times\frac{2}{3}=\frac{2}{30}\\ F_{1}^{(4)}&=F_{0}^{(4)}+\eta\gamma_{1,2}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\\ F_{1}^{(5)}&=F_{0}^{(5)}+\eta\gamma_{1,2}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\\ F_{1}^{(6)}&=F_{0}^{(6)}+\eta\gamma_{1,3}=0+0.1\times\frac{-2}{3}=-\frac{2}{30}\end{split}

(14)

Finally, we update the probabilities.

\begin{split}p_{1}^{(1)}&=\sigma\left(F_{1}^{(1)}\right)=0.5167\\ p_{1}^{(2)}&=\sigma\left(F_{1}^{(2)}\right)=0.5167\\ p_{1}^{(3)}&=\sigma\left(F_{1}^{(3)}\right)=0.5167\\ p_{1}^{(4)}&=\sigma\left(F_{1}^{(4)}\right)=0.4833\\ p_{1}^{(5)}&=\sigma\left(F_{1}^{(5)}\right)=0.4833\\ p_{1}^{(6)}&=\sigma\left(F_{1}^{(6)}\right)=0.4833\end{split}

(15)

A.3 Training iteration 2

Table 3: The residuals for iteration 2

index $i$	$\bm{x}^{(i)}$	$y^{(i)}$	$p_{1}^{(i)}$	$r_{2}^{(i)}$
1	1.3	1	0.5167	0.4833
2	1.5	0	0.5167	$-0.5167$
3	3.0	1	0.5167	0.4833
4	4.0	0	0.4833	$-0.4833$
5	6.5	1	0.4833	0.5167
6	8.4	0	0.4833	$-0.4833$

We enter iteration 2. GBC computes the residuals $r_{2}^{(i)}=y^{(i)}-p_{1}^{(i)}$ for each instance in this iteration. The result is shown in Table 3.

Figure 2: Tree

T_{2}

GBC builds another one-level decision tree $T_{2}$ to fit the residuals. Assuming the optimal split is at $\bm{x}=2.25$ , then we have a tree shown in Figure 2: training instances 1 and 2 are located in the first child of $T_{2}$ , and instances 3, 4, 5, 6 are in the second child of $T_{2}$ , i.e., $\Omega_{2,1}=\{1,2\}$ and $\Omega_{2,2}=\{3,4,5,6\}$ .

For each terminal node $j$ , the output value $\gamma_{j}$ is calculated by Equation 1. So:

\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{2,1}}r_{2}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{2,1}}\left(p_{1}^{(i)}(1-p_{1}^{(i)})\right)\approx-0.0669\\ \gamma_{2}=\sum_{\forall k\in\Omega_{2,2}}r_{2}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{2,2}}\left(p_{1}^{(i)}(1-p_{1}^{(i)})\right)\approx 0.0334\end{split}

(16)

Next, we can update the model predicted log odds $F_{2}^{(i)}$ based on Equation 2:

\begin{split}F_{2}^{(1)}&=F_{1}^{(1)}+\eta\gamma_{2,1}=\frac{2}{30}+0.1\times(-0.0669)\approx 0.0600\\ F_{2}^{(2)}&=F_{1}^{(2)}+\eta\gamma_{2,1}=\frac{2}{30}+0.1\times(-0.0669)\approx 0.0600\\ F_{2}^{(3)}&=F_{1}^{(3)}+\eta\gamma_{2,2}=\frac{2}{30}+0.1\times 0.0334\approx 0.0700\\ F_{2}^{(4)}&=F_{1}^{(4)}+\eta\gamma_{2,2}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\\ F_{2}^{(5)}&=F_{1}^{(5)}+\eta\gamma_{2,2}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\\ F_{2}^{(6)}&=F_{1}^{(6)}+\eta\gamma_{2,3}=-\frac{2}{30}+0.1\times 0.0334\approx-0.0633\end{split}

(17)

Finally, we update the probabilities.

\begin{split}p_{2}^{(1)}&=\sigma\left(F_{2}^{(1)}\right)\approx 0.5150\\ p_{2}^{(2)}&=\sigma\left(F_{2}^{(2)}\right)\approx 0.5150\\ p_{2}^{(3)}&=\sigma\left(F_{2}^{(3)}\right)\approx 0.5175\\ p_{2}^{(4)}&=\sigma\left(F_{2}^{(4)}\right)\approx 0.4842\\ p_{2}^{(5)}&=\sigma\left(F_{2}^{(5)}\right)\approx 0.4842\\ p_{2}^{(6)}&=\sigma\left(F_{2}^{(6)}\right)\approx 0.4842\end{split}

(18)

A.4 Training iteration 3

Table 4: The residuals for iteration 3

index $i$	$\bm{x}^{(i)}$	$y^{(i)}$	$p_{2}^{(i)}$	$r_{3}^{(i)}$
1	1.3	1	0.5150	0.4850
2	1.5	0	0.5150	$-0.5150$
3	3.0	1	0.5175	0.4825
4	4.0	0	0.4842	$-0.4842$
5	6.5	1	0.4842	0.5158
6	8.4	0	0.4842	$-0.4842$

We enter iteration 3. GBC computes the residuals $r_{3}^{(i)}=y^{(i)}-p_{2}^{(i)}$ for each instance in this iteration. The result is shown in Table 4.

Figure 3: Tree

T_{3}

GBC builds yet another decision stump $T_{3}$ to fit the residuals. Assuming the optimal split is at $\bm{x}=5.25$ , then we have a tree shown in Figure 3: training instances 1, 2, 3, and 4 are located in the first child of $T_{3}$ , and instances 5 and 6 are in the second child of $T_{3}$ , i.e., $\Omega_{3,1}=\{1,2,3,4\}$ and $\Omega_{3,2}=\{5,6\}$ .

For each terminal node $j$ , the output value $\gamma_{j}$ is calculated by Equation 1. So:

\begin{split}\gamma_{1}=\sum_{\forall k\in\Omega_{3,1}}r_{3}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{3,1}}\left(p_{2}^{(i)}(1-p_{2}^{(i)})\right)\approx-0.0317\\ \gamma_{2}=\sum_{\forall k\in\Omega_{3,2}}r_{3}^{(i)}\biggr{/}\sum_{\forall k\in\Omega_{3,2}}\left(p_{2}^{(i)}(1-p_{2}^{(i)})\right)\approx 0.0633\end{split}

(19)

Next, we can update the model predicted log odds $F_{3}^{(i)}$ based on Equation 2:

\begin{split}F_{3}^{(1)}&=F_{2}^{(1)}+\eta\gamma_{3,1}\approx 0.06+0.1\times(-0.0317)\approx 0.0568\\ F_{3}^{(2)}&=F_{2}^{(2)}+\eta\gamma_{3,1}\approx 0.06+0.1\times(-0.0317)\approx 0.0568\\ F_{3}^{(3)}&=F_{2}^{(3)}+\eta\gamma_{3,1}\approx 0.07+0.1\times(-0.0317)\approx 0.0668\\ F_{3}^{(4)}&=F_{2}^{(4)}+\eta\gamma_{3,1}\approx(-0.0633)+0.1\times(-0.0317)\approx-0.0665\\ F_{3}^{(5)}&=F_{2}^{(5)}+\eta\gamma_{3,2}\approx(-0.0633)+0.1\times 0.0633\approx-0.0570\\ F_{3}^{(6)}&=F_{2}^{(6)}+\eta\gamma_{3,3}\approx(-0.0633)+0.1\times 0.0633\approx-0.0570\end{split}

(20)

Finally, we update the probabilities.

\begin{split}p_{3}^{(1)}&=\sigma\left(F_{3}^{(1)}\right)\approx 0.5142\\ p_{3}^{(2)}&=\sigma\left(F_{3}^{(2)}\right)\approx 0.5142\\ p_{3}^{(3)}&=\sigma\left(F_{3}^{(3)}\right)\approx 0.5167\\ p_{3}^{(4)}&=\sigma\left(F_{3}^{(4)}\right)\approx 0.4834\\ p_{3}^{(5)}&=\sigma\left(F_{3}^{(5)}\right)\approx 0.4858\\ p_{3}^{(6)}&=\sigma\left(F_{3}^{(6)}\right)\approx 0.4858\end{split}

(21)

A.5 Prediction

Eventually, the predicted log odds can be shown in Equation 22, where the terminal nodes of a tree show the predicted log odds of that tree.

\hat{y}=0+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to93.72pt{\vbox to77.24pt{\pgfpicture\makeatletter\hbox{\hskip 46.17732pt\lower-54.65573pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{22.38574pt}{0.0pt}\pgfsys@curveto{22.38574pt}{12.36345pt}{12.36345pt}{22.38574pt}{0.0pt}{22.38574pt}\pgfsys@curveto{-12.36345pt}{22.38574pt}{-22.38574pt}{12.36345pt}{-22.38574pt}{0.0pt}\pgfsys@curveto{-22.38574pt}{-12.36345pt}{-12.36345pt}{-22.38574pt}{0.0pt}{-22.38574pt}\pgfsys@curveto{12.36345pt}{-22.38574pt}{22.38574pt}{-12.36345pt}{22.38574pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-20.86115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 3.5$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-25.15457pt}{-42.67912pt}\pgfsys@curveto{-25.15457pt}{-36.92902pt}{-29.81584pt}{-32.26775pt}{-35.56595pt}{-32.26775pt}\pgfsys@curveto{-41.31606pt}{-32.26775pt}{-45.97733pt}{-36.92902pt}{-45.97733pt}{-42.67912pt}\pgfsys@curveto{-45.97733pt}{-48.42923pt}{-41.31606pt}{-53.0905pt}{-35.56595pt}{-53.0905pt}\pgfsys@curveto{-29.81584pt}{-53.0905pt}{-25.15457pt}{-48.42923pt}{-25.15457pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-43.06596pt}{-45.17912pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2/3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-14.45898pt}{-17.35078pt}\pgfsys@lineto{-28.3182pt}{-33.98184pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-28.31819pt}{-33.98183pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-30.51585pt}{-29.35568pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{47.34256pt}{-42.67912pt}\pgfsys@curveto{47.34256pt}{-36.175pt}{42.07007pt}{-30.90251pt}{35.56595pt}{-30.90251pt}\pgfsys@curveto{29.06183pt}{-30.90251pt}{23.78934pt}{-36.175pt}{23.78934pt}{-42.67912pt}\pgfsys@curveto{23.78934pt}{-49.18324pt}{29.06183pt}{-54.45573pt}{35.56595pt}{-54.45573pt}\pgfsys@curveto{42.07007pt}{-54.45573pt}{47.34256pt}{-49.18324pt}{47.34256pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{26.39926pt}{-45.17912pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-2/3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{14.45898pt}{-17.35078pt}\pgfsys@lineto{27.44421pt}{-32.93304pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{27.4442pt}{-32.93303pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.57886pt}{-28.83128pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} }+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to103.98pt{\vbox to84.95pt{\pgfpicture\makeatletter\hbox{\hskip 52.78694pt\lower-59.90012pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{24.84619pt}{0.0pt}\pgfsys@curveto{24.84619pt}{13.72234pt}{13.72234pt}{24.84619pt}{0.0pt}{24.84619pt}\pgfsys@curveto{-13.72234pt}{24.84619pt}{-24.84619pt}{13.72234pt}{-24.84619pt}{0.0pt}\pgfsys@curveto{-24.84619pt}{-13.72234pt}{-13.72234pt}{-24.84619pt}{0.0pt}{-24.84619pt}\pgfsys@curveto{13.72234pt}{-24.84619pt}{24.84619pt}{-13.72234pt}{24.84619pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-23.36115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 2.25$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-18.54495pt}{-42.67912pt}\pgfsys@curveto{-18.54495pt}{-33.27858pt}{-26.1654pt}{-25.65813pt}{-35.56595pt}{-25.65813pt}\pgfsys@curveto{-44.96649pt}{-25.65813pt}{-52.58694pt}{-33.27858pt}{-52.58694pt}{-42.67912pt}\pgfsys@curveto{-52.58694pt}{-52.07967pt}{-44.96649pt}{-59.70012pt}{-35.56595pt}{-59.70012pt}\pgfsys@curveto{-26.1654pt}{-59.70012pt}{-18.54495pt}{-52.07967pt}{-18.54495pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-51.12154pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-0.0669}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-16.03412pt}{-19.24095pt}\pgfsys@lineto{-24.08685pt}{-28.9042pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-24.08684pt}{-28.90419pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-29.18774pt}{-27.76195pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{50.99734pt}{-42.67912pt}\pgfsys@curveto{50.99734pt}{-34.1565pt}{44.08858pt}{-27.24773pt}{35.56595pt}{-27.24773pt}\pgfsys@curveto{27.04332pt}{-27.24773pt}{20.13455pt}{-34.1565pt}{20.13455pt}{-42.67912pt}\pgfsys@curveto{20.13455pt}{-51.20175pt}{27.04332pt}{-58.11052pt}{35.56595pt}{-58.11052pt}\pgfsys@curveto{44.08858pt}{-58.11052pt}{50.99734pt}{-51.20175pt}{50.99734pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{21.67702pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{0.0334}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{16.03412pt}{-19.24095pt}\pgfsys@lineto{25.10448pt}{-30.12537pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{25.10446pt}{-30.12535pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.19655pt}{-28.37253pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} }+\eta\raisebox{-0.5pt}{ \leavevmode\hbox to103.98pt{\vbox to84.95pt{\pgfpicture\makeatletter\hbox{\hskip 52.78694pt\lower-59.90012pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}{{{{}}}}{}{}\hbox{\hbox{\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{24.84619pt}{0.0pt}\pgfsys@curveto{24.84619pt}{13.72234pt}{13.72234pt}{24.84619pt}{0.0pt}{24.84619pt}\pgfsys@curveto{-13.72234pt}{24.84619pt}{-24.84619pt}{13.72234pt}{-24.84619pt}{0.0pt}\pgfsys@curveto{-24.84619pt}{-13.72234pt}{-13.72234pt}{-24.84619pt}{0.0pt}{-24.84619pt}\pgfsys@curveto{13.72234pt}{-24.84619pt}{24.84619pt}{-13.72234pt}{24.84619pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-23.36115pt}{-2.79236pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\bm{x}^{(i)}\leq 5.25$?}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{-18.54495pt}{-42.67912pt}\pgfsys@curveto{-18.54495pt}{-33.27858pt}{-26.1654pt}{-25.65813pt}{-35.56595pt}{-25.65813pt}\pgfsys@curveto{-44.96649pt}{-25.65813pt}{-52.58694pt}{-33.27858pt}{-52.58694pt}{-42.67912pt}\pgfsys@curveto{-52.58694pt}{-52.07967pt}{-44.96649pt}{-59.70012pt}{-35.56595pt}{-59.70012pt}\pgfsys@curveto{-26.1654pt}{-59.70012pt}{-18.54495pt}{-52.07967pt}{-18.54495pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{-35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-51.12154pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{-0.0317}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{-16.03412pt}{-19.24095pt}\pgfsys@lineto{-24.08685pt}{-28.9042pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.64018}{-0.76822}{0.76822}{-0.64018}{-24.08684pt}{-28.90419pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}{{}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-29.18774pt}{-27.76195pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{Y}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}}\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{50.99734pt}{-42.67912pt}\pgfsys@curveto{50.99734pt}{-34.1565pt}{44.08858pt}{-27.24773pt}{35.56595pt}{-27.24773pt}\pgfsys@curveto{27.04332pt}{-27.24773pt}{20.13455pt}{-34.1565pt}{20.13455pt}{-42.67912pt}\pgfsys@curveto{20.13455pt}{-51.20175pt}{27.04332pt}{-58.11052pt}{35.56595pt}{-58.11052pt}\pgfsys@curveto{44.08858pt}{-58.11052pt}{50.99734pt}{-51.20175pt}{50.99734pt}{-42.67912pt}\pgfsys@closepath\pgfsys@moveto{35.56595pt}{-42.67912pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{21.67702pt}{-45.90134pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{0.0633}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {{}}{}{{}} {{{{{}}{}{}{}{}{{}}}}}{}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{}{}{}{}{{{}{}}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}{}{}{}{{}}\pgfsys@moveto{16.03412pt}{-19.24095pt}\pgfsys@lineto{25.10448pt}{-30.12537pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.64018}{-0.76822}{0.76822}{0.64018}{25.10446pt}{-30.12535pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{22.19655pt}{-28.37253pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{N}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} }

(22)

If given a new test instance with $\bm{x}=7$ , the predicted log odds is

\hat{y}=0+0.1\times\left(-\frac{2}{3}\right)+0.1\times 0.0334+0.1\times 0.0633\approx-0.0570

(23)

The predicted probability that $y=1$ is

p=\sigma\left(-0.0570\right)=0.4858

(24)

If we set the threshold at $0.5$ , the GBC predicts this instance as negative.

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of γj\gamma_{j}

Abstract

1 Introduction

2 GBC training and prediction

2.1 Training process

2.2 Prediction process

3 Where does γj\gamma_{j} come from?

3.1 Trial 1: set the derivative to zero and solve the equation

3.2 Trial 2: Approximate the loss function by Taylor’s series

4 Summary

References

Appendix A Appendix: a working example

A.1 Training iteration 0: initialization

A.2 Training iteration 1

A.3 Training iteration 2

A.4 Training iteration 3

A.5 Prediction

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $\gamma_{j}$

3 Where does $\gamma_{j}$ come from?