11email: t.dam@student.adfa.edu.au, {s.anavatti,h.abbas}@adfa.edu.au
22institutetext: STEM, University of South Australia, Adelaide, Australia 22email: dhika.pratama@unisa.edu.au33institutetext: ATMRI, Nanyang Technological University, Singapore
33email: mdmeftahul.ferdaus@ntu.edu.sg
Scalable Adversarial Online Continual Learning
This supplementary document is comprised of following two sections.
-
•
Section 1: Properties of our proposed SCALE along with baselines with respect to online continual learning setting.
-
•
Section 2: hyperparameter configurations for all the methods.
1 Task Specifications
In this work, task orders are not randomized or optimized. In case of SCIFAR-10, SCIFAR-100, and SMINIIMAGENET, same order of tasks as in the original datasets are maintained, whereas, it is random (by default) in pMNIST. Main features of our proposed SCALE along with baselines with respect to online continual learning setting is presented in Table 1.
Method | Episodic Memory | Task-Specific Parameters | Require Task ID During Inference | Store Historical Params/Grads/Logits (training) | Store Historical Params (testing) |
GEM | |||||
MER | |||||
MIR | |||||
ER | |||||
CTN | |||||
SCALE |
2 Hyperparameter Specifications
Hyperparameter settings of consolidated methods are presented as follows:
-
1.
GEM
-
1.1.
Learning Rate: 0.03 (PMNIST, SCIFAR-100 and SCIFAR-10) and 0.05 ( SMINIIMAGENET)
-
1.2.
Number of gradient updates: 1 (all benchmarks)
-
1.3.
Margin for QP: 0.5 (all benchmarks)
-
1.1.
-
2.
MER
-
2.1.
Learning Rate: 0.03 (PMNIST) and 0.05 (SCIFAR-100, SCIFAR-10 and SMINIIMAGENET),
-
2.2.
Replay batch size: 64 (all benchmarks)
-
2.3.
Reptile rate : 0.3 (all benchmarks)
-
2.4.
Number of gradient updates: 3
-
2.1.
-
3.
MIR
-
3.1.
Learning Rate: 0.03 (all benchmarks)
-
3.2.
Replay batch size: 10 (all benchmarks)
-
3.3.
Number of gradient updates: 3
-
3.1.
-
4.
ER
-
4.1.
Learning Rate: 0.03 (all benchmarks)
-
4.2.
Replay batch size: 10 (all benchmarks)
-
4.3.
Number of gradient updates: 3
-
4.1.
-
5.
CTN
-
5.1.
Inner learning rate: 0.03 (PMNIST) 0.01 (SCIFAR-100, SCIFAR-10 and SMINIIMAGENET)
-
5.2.
Outer learning rate: 0.1 (PMNIST) 0.05 (SCIFAR-100, SCIFAR-10 and SMINIIMAGENET)
-
5.3.
Number of inner & outer updates: 2 (all benchmarks)
-
5.4.
Temperature and weight for KL: 5, 100 (all benchmarks)
-
5.5.
Replay batch size: 64 (all benchmarks)
-
5.6.
Semantic memory percentage: 20%
-
5.1.
-
6.
SCALE
-
6.1.
Inner learning rate: 0.1 (PMNIST) 0.01 (SCIFAR-100, SCIFAR-10 and SMINIIMAGENET)
-
6.2.
Outer learning rate: 0.01 (PMNIST) 0.1 (SCIFAR-100, SCIFAR-10 and SMINIIMAGENET)
-
6.3.
Adversarial learning rate: 0.001 (PMNIST, SCIFAR-100, SCIFAR-10 and SMINIIMAGENET)
-
6.4.
Number of inner & outer updates: 1 (all benchmarks)
-
6.5.
Number of discriminator update: 1 (all benchmarks)
-
6.6.
Weights of (all benchmarks)
-
6.7.
Replay batch size: 64 (all benchmarks)
-
6.1.
When doing cross-validation across the three validation tasks, grid search is performed to ensure that each hyper-parameter is consistent across all three tasks, which will not be seen during continuous learning..
-
•
-
•
-
•
-
•