Supplementary Materials for MST-compression: Compressing and Accelerating Binary Neural Networks
Appendix A Proof of Eq. (2)
According to the convolution definition, the output of the channel as the following
(8) |
There are multiplications, and these multiplications’ output is -1 or 1. Assuming that there are multiplications with the output and multiplication with output , we have . Thus, can be derived as
(A1) |
In addition, because and are binarized, A can be calculated as
(A2) |
Appendix B Proof of Eq. (3)
Assuming that includes weight values of the channel , which are similar to the weights of the channel (compared one-one respectively). includes weight values of the channels , which are different from the weights of the channel (compared one-one respectively), . and are input activations for and , respectively. can be as
(A3) |
Because input activation of the channel is the same as that of the channel . Suppose includes weights of the channel , which are different from that of the channel , . We can have as
(A4) |
In consequence, can be calculated as,
(A5) |
and input activations, based the characteristics of XNOR operation, we have
In Sec. 2, we have . Thus, we finally have the following equation.
(3) |
Appendix C Additional results
Effect of the number of centers. In this section, we provide an additional experimental results related to the effect of the number of initial centers for the training. In particular, we do the training on VGG-small model and CIFAR-10 dataset with different number of centers, while is fixed at 4e-6. Besides, each number of centers, we execute the training three times and get the mean value for the report.
Table A1 provides the MST depth, number of parameters, bit-ops and accuracy w.r.t. different number of centers. Accordingly, the MST depth, number of parameters and bit-ops tend to increase as the number of centers increases. Specifically, when the number of centers changes from to , the MST depth increases , the number of parameters and bit-ops increase and , respectively. Meanwhile, accuracy barely changes with different number of centers. For each binary convolution layer, as shown in Figure A1, as the number of centers increases, both the MST depth and number of parameters also increase. These findings suggest that opting for a single center is the most effective strategy to minimize MST depth, parameters, and bit-ops while preserving accuracy.
#centers | MST-depth |
|
|
|
||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | ||||||||||
2 | ||||||||||
4 | ||||||||||
6 | ||||||||||
8 |