\ArticleType

LETTER \Year2018 \Month \Vol61 \No \DOI \ArtNo \ReceiveDate \ReviseDate \AcceptDate \OnlineDate

Title for citation

jlding@mail.neu.edu.cn yaochu.jin@surrey.ac.uk \AuthorMarkAuthor A

\AuthorCitation

Author A, Author B, Author C, et al

Incremental Data-driven Optimization of Complex Systems in Nonstationary Environments

Cuie YANG Jinliang DING Yaochu JIN Tianyou CHAI State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, 110004, China Department of Computer Science, University of Surrey, Guildford, Surrey GU27 XH, U.K.

\deareditor

In a complex system, many real-world optimization problems do not have explicit optimization functions or constraint functions, whereas only the data from production processes is available due to the complexity of the system [1]. Thus the optimization in this scenario can generally rely on the collected historical datasets and these problems are also known as offline data-driven optimization problems [2]. In the last decade, the literature for offline data-driven optimization works by first constructing a surrogate model with the collected data and then considering the best solution of the surrogate model as the optimal decision [1], little work has been considered the errors existed in surrogate models. In addition, production processes often subject to changes, leading to the attainment of a data that are non-independent identically distributed of the obtained data. In this letter, we aim to deal with the offline data-driven optimization problem that the data generated in nonstationary systems in an incremental manner, specifically, we consider the data comes in a chunk as shown in in Figure 1 in the supplementary section. This type of data-driven optimization poses new challenges to the current data-driven optimization algorithms [2], in which we focus on the following three issues. The first is how to build a high quality surrogate model for each environment. The second is about the optimization, that is, how to quickly exploit optimal solution of each new environment. The last is final solution creation, this is because the errors existed between surrogate models and corresponding real fitness functions (the unknown formulations of real systems), resulting in a less reliability of the solutions from surrogate models. In order to alleviate the above difficulties, this letter suggests a general method as described in follows:
Step 1: WHILE new data chunk ${D_{t}}$ is income from complex systems at the $t-th$ environment
Step 2:Update the surrogate model based on knowledge transfer technique to adapt the $t-th$ environment;
Step 3:Initialize population based on historical knowledge;
Step 4:Optimize the surrogate model by using DE algorithm;
Step 5:Produce the final solution for complex systems;
Step 6: END WHILE

\lettersection

Approach A general framework of the proposed approach is listed as above. At the observation of each new data chunk from complex systems, the surrogate model is first updated to adapt to the new environment. A differential evolution (DE) algorithm [5], a kind of population based global optimization approach, is adopted as the optimizer. Thus, the next step is a population initialization for the DE algorithm. Then the DE algorithm with the rand/1 strategy is applied to optimize the surrogate model in order to obtain optimal landscape of real fitness problems. Lastly, the final solution is generated from the searched superior landscape. Details of surrogate model update, population initialization and final solution production are presented as follows.

•

Knowledge transfer based surrogate model adaptation: Ensemble approaches are a popular method to handle incremental learning, which uses models or training instances of historical environments to improve model quality of the current environment [3]. Nevertheless, most work in incremental learning focuses on classification tasks [3], in the optimization problem of this letter, we introduce an ensemble approach for regression tasks to formulate the surrogate model. To be specific, a base regression learner is first trained via the new data chunk ${D_{t}}$ , equivalently $({{\bf{x}}_{t}},{{\bf{y}}_{t}})$ , and denoted as ${h_{t}}$ . The radial basis neural network (RBFN) is applied as a base learner in this work due to its universal approximation ability [4]. After that, a set of base learners trained by using data chunks of each past environment is also constructed separately. To improve the adaptability of past training instances, we first map the historical data chunks ${D_{1}},{D_{2}},...,{D_{t-1}}$ to the current data chunk space ${D_{t}}$ , thus to facilitate the knowledge transfer between the historical data set and the current data set. Then use the combination of each of the transferred historical data chunk ${D_{ni}}$ and the current data chunk ${D_{t}}$ to build each historical base surrogate model ${h_{i}}$ . Note that, we are interested in dynamics of the system, which would result in the change of function the value ${{\bf{y}}_{t}}$ . Thus, we transform ${{\bf{y}}_{i}},i=1,2,...,t-1$ in each ${D_{i}}$ to the current objective space ${{\bf{y}}_{t}}$ by using Eq. 1. we denote the transferred ${D_{i}}$ , which contains $({{\bf{x}}_{i}},{{\bf{y}}_{ni}})$ , as ${D_{ni}}$ , $i=1,2,...,t-1$ .

{{\bf{y}}_{ni}}=\frac{{{{\bf{y}}_{i}}-y^{min}_{i}}}{y^{max}_{i}}-{y^{min}_{i}}\times({y^{max}_{t}}-{y^{min}_{t}})+{y^{min}_{t}}

(1)

where ${y^{max}_{i}}$ and ${y^{min}_{i}}$ are the maximum and minimum value of ${{\bf{y}}_{i}}$ , ${y^{max}_{t}}$ and ${y^{min}_{t}}$ are the maximum and minimum value of ${{\bf{y}}_{t}}$ , respectively.

In the next step, all the base surrogate model ${h_{i}}$ , $i=1,2,...,t$ are integrate into final perfect surrogate $f$ by using Eq. 2,

f=\frac{{\sum\limits_{i=1}^{t}{{w_{i}}{h_{i}}}}}{{\sum\limits_{i=1}^{t}{{w_{i}}}}}

(2)

where ${w_{i}}=\frac{1}{{RMS{E_{i}}+RMS{E_{t}}}},i=1,2,...,t-1$ , ${w_{t}}=\frac{1}{{RMS{E_{t}}}}$ and $RMS{E_{i}}=\sqrt{\frac{1}{{\left|{{D_{i}}}\right|}}\sum\limits_{y\in{{\bf{y}}_{i}}}{{{(\hat{y}-y)}^{2}}}},i=1,2,...,t$ .

In this ensemble, we assign a larger ${w_{t}}$ than ${w_{i}},i=1,2,...,t$ to ensure a higher weight of the base model in the current environment. The reason is that the data set of the current environment is more reliable, thus it should be fully used.

•

A priori knowledge based population initialization: After the surrogate model of the current environment is obtained, an initial population should be created before the surrogate model optimization is started. The initial population is often randomly generated in the decision space in traditional DE algorithms. In reality, the surrogate models of different environments are not isolate since their training instances are from the same system. Therefore, using the historical knowledge of the past environments in the population initialization would benefit convergence of the current environment. For simplicity, the candidates of the latest environment are applied as the initial population in this work. Note that, the initial population is randomly generated for the first environment because there is no historical information in the beginning.
•

Best solution averaging based final solution production: As mentioned above, solutions obtained during optimization are not allowed to be evaluated by true complex problems, instead they are only evaluated by surrogate models. In this case, it is interesting to create a high quality final solution for real fitness functions since no surrogate can be updated using the real fitness function and the fitness value of a solution evaluated by surrogates may exist large errors with that evaluated by real fitness problems. This letter proposes a best solution averaging technique to generate final solution for real fitness function instead of directly using the best solution of the obtained candidates. Specifically, in the final population of each environment, the average of the top 10 percent best individuals is considered as the final solution. In this way, the errors of the final solution induced by surrogates can be smoothed by consulting a number of candidates.

\lettersection

Experimental results The six dynamic optimization benchmark problems [6] are applied to examine the transferred surrogate model construction, population initialization and final solution production strategies. The number of decision variables $D$ of each problem is set to 10. The total number of environments in each test problem is set to 50. In each environment, $3D$ points generated by Laplace sampling and evaluated by the real fitness function are taken as historical dataset, where $D$ is the number of decision variables. The experiment conducts on different approaches of incremental data-driven optimization in non-stationary environments, SS (Single dataset based Surrogate model construction techniuqe), KTS (Knowledge Transfer based Surrogate model construction), KTSPI (the version of KTS by inducing a Prior knowledge based population Initialization), KTTLSA-TBA (the KTSPI algorithm with Best solution Averaging based final solution production technique) to verify each of the proposed technique.

Table 1: The average result of 50 environments over 20 independent runs

Name	SS	KTS	KTSPI	KTSPI-TBA
F1	3.7472	3.4762	3.0220	2.9770
F2	578.2938	546.8831	528.7354	528.9487
F3	1.121e+03	1.094e+03	1.077e+03	1.074e+03
F4	646.0489	610.8318	592.6871	589.7063
F5	2.021e+03	2.016e+03	1.999e+03	1.997e+03
F6	2.341e+03	1.399e+03	1.317e+03	1.314e+03

Table 1 presents the average results of 50 environment over 20 independent runs of the compared algorithms. We can see from this table that, the value that obtained by SS, KTS, KTSPI, KTSPI-TBA becomes smaller in general, indicating the suitability of the knowledge transfer based surrogate model update technique, historical knowledge population initialization technique and averaging based final solution production technique. The result on each environment over 20 independent runs of the compared algorithms on F1 and F5 is presented in Figure 2 in the supplementary section. This figure shows an obvious superior performance of KTSPI and KTSPI-TBA in comparison with SS and KTS on most of the experiments on F1. On the F5 problem, we can find that KTS clearly outperforms SS on about all experiments, in addition, it also can be seen that the KTSPI-TBA algorithm achieves a more robust performance along the experiment compared with KTSPI.

\lettersection

Conclusion This letter proposes a general method to solve data-driven problem in nonstationary environment which includes three techniques, i.e., knowledge transfer based surrogate model adaptation technique, historical knowledge based population initialization technique and the best solution averaging based final solution production technique . We systematically compare each technique by applying the four proposed incremental data-driven optimization approaches on six benchmark problems, the statistical results reveal the promising performance of each strategy in addressing the incremental offline data-driven problem in dynamic environments.

\Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grand 61525302, 61590922, the Project of Ministry of Industry and Information Technology of China under Grand 20171122-6, the Projects of Shenyang under Grand Y17-0-004, the Fundamental Research Funds for the Central Universities under Grand N160801001, N161608001, and the Outstanding Student Research Innovation Project of Northeastern University under Grand N170806003.

References

[1] Yang C, Ding J. Constrained dynamic multi-objective evolutionary optimization for operational indices of beneficiation process. Journal of Intelligent Manufacturing, 2017: 1-13.
[2] Wang H, Jin Y, Jansen J O. Data-driven surrogate-assisted multiobjective evolutionary optimization of a trauma system. IEEE Transactions on Evolutionary Computation, 2016, 20.6: 939-952.
[3] Ditzler G, Roveri M, Alippi C, et al. Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 2015, 10.4: 12-25.
[4] Park J, Sandberg I W. Universal approximation using radial-basis-function networks. Neural computation, 1991, 3(2): 246-257.
[5] Storn R, Price K. Differential evolutionary simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 1997, 11(4): 341-359.
[6] Li C, Yang S, Nguyen T T, et al. Benchmark generator for CEC 2009 competition on dynamic optimization. IEEE Congress on Evolutionary Computation Technical Report. 2008.