Capturing waste collection planning expert knowledge in a fitness function through preference learning

Laura Fernández Díaz laura.fernandezdiaz@arcelormittal.com Miriam Fernández Díaz miriam.fernandezdiaz@arcelormittal.com José Ramón Quevedo quevedo@uniovi.es Elena Montañés montaneselena@uniovi.es Artificial Intelligence Center. University of Oviedo at Gijón, 33204 Asturias, Spain http://www.aic.uniovi.es Global R&D, ArcelorMittal (Spain)

Abstract

This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The drawback is that even experts are not able to propose one in a straightforward way due to the complexity of the process. Hence, the goal of this paper is to build a fitness function though a preference framework, taking advantage of the available expert knowledge and expertise. Several key performance indicators together with preference judgments are carefully established according to the experts for learning a promising fitness function. Particularly, the additivity property of them makes the task be much more affordable, since it allows to work with routes rather than with route plannings. Besides, a feature selection analysis is performed over such indicators, since the experts suspect of a potential existing (but unknown) redundancy among them. The experiment results confirm this hypothesis, since the best $C-$ index ( $98\%$ against around $94\%$ ) is reached when 6 or 8 out of 21 indicators are taken. Particularly, truck load seems to be a highly promising key performance indicator, together to the travelled distance along non-main roads. A comparison with other existing approaches shows that the proposed method clearly outperforms them, since the $C-$ index goes from $72\%$ or $90\%$ to $98\%$ .

keywords:

Machine learning , KPI , classification , preferences , route

^†^†journal: Engineering Applications of Artificial Intelligence

1 Introduction

Until now, each local council has been requested waste collection services to the company Consorcio para la Gestión de los Residuos Sólidos de Asturias (COGERSA). The different requests have been appearing at different moments in time and every time a request has arisen, the experts of this company have just focused on a particular area to design specific routes, keeping unaltered the already planned routes. Hence, the actual routes have been locally optimized in each area object of request and one isolated from the rest. But now, the company requires unifying the waste collection planning in order to optimize their human and material resources, the collection time, fuel consumption and so on. The task is not as easy as it could seem because several issues must be taken into account. On one hand, there are several planning constraints imposed by the governments of each council. For instance, a container located at the entrance of the schools must not be picked up at the same time that the classes start or end. Also, the waste collection cannot be carried out when there is a street market planned. In addition, governments impose the collection frequencies and the number of containers. Also, the working time regulations must be considered, since, for instance, a same worker or truck cannot exceed a certain number of work hours. On the other hand, there are some other constraints about available resources. For instance, in regard to the laterality¹¹1Laterality is the property that has a container in order to indicate the side of the road to pick it up., the containers can be located to the right or to the left of the road, then, in case of two way roads, trucks must turn around in order to be able to pick up the container depending on the direction of traffic. Another issue to take into account is the distance that a truck travels with its cargo box loaded, since it might be better for the truck to pick up the containers on the way back to travel less distance with the cargo box full.
This problem is typically solved using planning optimization algorithms. The application of these techniques requires a fitness function able to evaluate the quality of a route planning. The main drawback is there exists a lack of a suitable function of this kind, and, even more, it is neither obvious nor easy to obtain in the waste collection. Hence, and firstly we need to face to the hurdle of designing a fitness function to evaluate a route planning, which will be the main focus and the main contribution of this paper. The plan in the future is to include this fitness function into an optimization algorithm scheme, but now this task is out of the scope of this paper. Despite the experts are not able to design a straightforward fitness function, they have a lot of expertise knowledge susceptible of being mined. Within artificial intelligence field, there exist two main ways of carrying out this practice. On one hand, if the experts are able to describe their knowledge, then this knowledge can be developed using rules leading to an expert system, for instance, for refrigerant flow air conditioning systems [13]. On the other hand, if only the actions of the experts are available but they are not able to explain them, then, it is possible to use machine learning techniques to extract their knowledge from those actions, for instance, for automated cyber security data triage [34]. The latter is the case of the COGERSA Company, since experts perform actions from their expertise, but they are not able to display the reasons about their decisions. There exist several ways of taking into account the expert knowledge to feed a machine learning approach. A straightforward possibility could be to ask the experts for providing a score for a route planning and then perform a learning process through a regression system. However, this kind of approaches are not adequate in general, since experts ratings cannot be interpreted as absolute assessments. The main reason is that experts tend to rate in a relative way, hence, the same expert might provide a different score to the same route planning depending on if this route planning is evaluated in a context with much more better route plannings or if it is evaluated in a context with much more worse route plannings. Thus, the route planning presented in a batch surrounded by worse route plannings will probably get a higher score than if this route planning were showed together with better route plannings. This situation is known as batch effect and it often biases the ratings. Hence, feeding a regression method with this unstable data can mislead the learning process [9]. However, it is possible to state that the expert assessments are consistent in regard to the order, that is, experts may provide the same order to the same route plannings set despite the score granted may differ from one expert to another. This evidence justifies that feeding a machine learning system with expert information in form of preference judgments is quite more promising than doing so in form of ratings, since avoiding the assumption that certain rating means the same for every expert avoids to mislead the learning process [5, 7]. This is the reason why the proposal of this paper advocates to fit the problem of finding an adequate fitness function to assess a route planning into a preference learning framework [3]. This environment not only requires to capture expert knowledge and expertise in form of preference judgements, otherwise the route plannings have to be described through indicators experts must establish in order to machine learning takes place. Those indicators will be the features of the machine learning process and they must capture the key issues experts implicitly pay attention to when they establish their preference judgements. The reality is that both kind of information were not available, hence, experts were asked for both preference judgments and key performance indicators. In fact, one of the contributions of this paper is an exhaustive design of key performance indicators to establish the route planning descriptions. Another contribution is to design route planning pairs from the actual route planning to make experts show their preferences, for which a greedy algorithm will be designed. The third contribution is the preference learning itself. Finally, experts reported a highly existing redundancy among the key indicators. At this respect, they manually proposed to take several sets to explore their impact. However, this paper goes further and the fourth contribution consists of carrying out an automatic feature selection procedure to establish the relevance of the key performance indicators according to a machine learning procedure. There are other approaches in the literature that provide a fitness function from key indicators. For instance, different key performance indicators from mobile networks [23] are combined using two approaches. The first one consists of a regression procedure, for which absolute ratings are required, a task almost unachievable by the experts. Besides, and, as commented before, this approach suffers from batch effect that can mislead the learning process. Even more, a regression procedure is not able to optimize the ranking [23]. The second approach optimizes the Spearman’s rank correlation using a particle swarm optimization, which is a population-based metaheuristic for solving continuous and discrete optimization problems. Our approach will be compared with this second approach in the experiments. Other works [29, 35] respectively proposes and uses in transport field a method called SWARA that combine key performance indicators. However, experts must sort them according to their significance. Also, relative significance must be stated beforehand, that is, it must be determined how much an indicator is more important than the next indicator in the sorting. At this respect, experts in the waste collection are not able to sort them, and must less to establish beforehand a relative significance, then, this method has been discarded. Also, another work [8] optimizes urban traffic flow using a genetic algorithm with Petri net analysis as fitness function. Their fitness function consists just in one key performance indicator, namely, the time, for whose estimation the Petri net is built. In our context, the time will also be a key performance indicator that will require to be estimated, but our proposal will involve performing a regression procedure. A Petri net cannot be applied in our context, since experts are required to build a Petri net. This task would involve establishing relationships among waste containers as the same way this work does among the roads using transitions. At this respect, experts do not report an existing relationship among containers. In fact, the original goal is to provide an optimized route planning, which means to establish the relationship among containers consisting in which container is collected next. They compare their approach with the lower distance. Hence, our approach will be compared with just consider the distance as fitness function. Another study [26] proposes a route planning and scheduling of waste collection and transport. The main drawback of this work is that it only uses one key performance indicator, namely, the time, which is estimated through a regression procedure from traveling distance. Our work goes further in the sense that experts reveal quite more issues involved in the route planning that just the time. For instance, they reveal the high influence of truck load and road kind as relevant aspects, so, this solution was discarded. All the same, our approach will be compared with the case of using the time key performance procedure as fitness function.

The rest of the paper is organized as follows. Section 2 describes some related work to this proposal. Section 3 details the whole process of building a promising fitness function for capturing the waste collection planning expert knowledge, including the design. The description of the experiments and the discussion of the results are exposed in Section 4. Finally, Section 5 draws some conclusions and proposes some lines of research for future work.

2 Related Work

There are a lot of works that take the expert and human information to generate promising approaches that capture the expert knowledge. For example, a work [22] develops a model for detecting oil spill from expert information. Experts had to view images to detect suspicious regions and to classify them as positive or negative examples. Also, the experts provide the features, but the model obtained disregards many of them obtaining a good performance. Also, experts assess beef cattle as meat producers comparing animals with other partners [2]. Some body dimensions obtaining by a 3D photo describe the beefs. The expert knowledge was captured in a ranking function that unifies the preferences of different experts. Similarly, activity rubric is adjusted from the ranking that lecturers made from student answers [28]. The goal is to determine the weight of each activity item using partial rankings made by lectures from several student answers to the activity. Other study related to education [12] proposes a machine learning approach to obtain the most relevant factors that affect the employability and employment. To do that, some academics from different fields had to define the competencies that could be important for the employment using Spanish university data and student questionnaires. The aim of this study was to predict if a person will get employed or not and to extract the most relevant factors in order to know the best way for preparing the students for the labour market. Another work [19] presents a strategy to optimize the search engine retrieval quality. The user knowledge about the document ranking related to a query is given to the search engine through the click through data. The document the user clicks on is preferred with regard to the rest of documents. Support vector machines for ranking is fed with that information in order to generate a model able to rank the documents according to user preferences.

3 A fitness function for capturing waste collection planning expert knowledge

Before going in depth of obtaining a promising fitness function that captures the expert knowledge in waste collection planning, let us state some issues in order to better understand the current waste collection process. Actually, as commented before, there exists a set of routes locally planned by the experts from the successive council requests. The company has several garages in order to park the trucks. Each route begins in a garage and ends in the same garage. The company also has transfer stations located in different places of the region where the trucks can make one or more stopovers in order to unload the waste if the cargo box is too full during a route. Also, each route has a group of collection geographic points that can include more than one container (of the same or of different capacity). If there is more than one container at the same collection point, the capacity at that collection point will be considered as the accumulated capacity of all the containers at that point. A truck and a human team are assigned to each route. The human teams are formed by collection workers and by drivers. The number of workers of a team depends on the route and none worker can exceed 38 work hours per week. Also, workers must not work more than 8 hours per day and the trucks almost have a limit time of being driven.
A map of the container location together with the local designed routes allows concluding that the routes take the form of bouquets, since the majority have a central area with a high collection frequency and some branches with a low collection frequency. This is so because, the central area, which coincides with the settled area, is collected daily, but only one or two branches are included in the daily route. The experts have designed the routes per council (some routes cover more than one council if the areas are small) just using their experience and intuition causing the routes to have different duration. This is one of the pitfalls of the current waste collection planning, since there are some trucks and staff that overwork against others that underwork.
Some other issues include limitations in the container dimension in villages, the existence of vehicles that exceed the village weight and height limits or the allowed collection hour range at rush hours. Also, the possible delays in taking out the waste of some private companies because they have their container in a private enclosure or the adverse weather effects are other factors that make difficult the waste collection process. Besides, drivers might prefer to climb a road in worse conditions and to get down another road in better conditions than the opposite. Furthermore, the collection workers might prefer to collect the containers downhill than uphill. Another issue to keep in mind is the unloading time spent in transfer stations, which takes approximately two minutes. In addition to this, depending on the kind of truck, the running board, that is, the time to collect the waste can change. Moreover, the local governments determinate the amount of containers per surface and per population and decide the collection frequency of the container areas in terms of number of days. Owing to that, each container could have different frequency depending on the area. Furthermore, the collection frequencies of the containers may vary depending on the season of the year (Christmas time, summer, weekends…). It also happens that, for instance, in rural areas there are not time constraints to collect the waste and in some places container laterality is not an essential feature since sometimes the containers can be moved from one side of the road to the other. Other factor to take into account is the time that the driver takes from the garage to the first container on the route and from the last container to the garage.

Given the complexity and casuistry of the waste collection process taking into account all the above-mentioned issues, defining a fitness function for evaluating a whole waste collection planning in order to be used in the future in a planning optimization algorithm to produce a global waste collection planning is clearly a non trivial task. Our proposal is to capture the expertise and knowledge of the experts fitting the problem in a preference learning framework. In this context, one expects to have the disposal of waste collection planning pairs for which the experts must express their preferences. This means that a preference learning algorithm is fed by what is called preference judgments. A preference judgement is a pair $(x,y)$ of route plannings such that the route planning $x$ is preferred to the route planning $y$ because the quality of $x$ , $f(x)$ , is higher than the quality of $y$ , $f(y)$ , that is, $f(x)\geq f(y)$ , but the values of $f(x)$ and $f(y)$ are unknown. Therefore, a preference learning algorithm is fed by the set

S=\{(x,y):f(x)\geq f(y)\}

(1)

Then, the preference learning statement consists in obtaining a preference or ranking function $f$ defined in the route planning space such that $f(x)\geq f(y)$ whenever $x$ is preferred to $y$ , that is, when $(x,y)\in S$ . This means that $f$ establishes an order or rank for the route plannings involved in $S$ , that is, applying this function to the route plannings in $S$ one can order them. However, this learning goes further, since $f$ is a promising discovery because it is able to yield a rating for whatever route planning $x$ out of $S$ .

Now, it is necessary to describe a route planning $x$ through features and show to the experts pairs of route plannings from which they decide which member of the pair is preferable taking into account their expertise and knowledge. The idea is to obtain expertise and knowledge from the experts from two different points of view. The first one is to discover the key indicators experts consider relevant in order to obtain a route planning description in form of features. The second one is to find out the preference judgment of the experts looking at the route plannings themselves, without taking into account explicitly the key indicators. Then, the ranking or preference function $f$ is induced from these information sources in a attempt to reproduce the expert decision knowledge.

One of the difficulties that have arisen at this point is that the experts do not have the intellectual capacity of working in their minds with complete route plannings. They report serious difficulties in defining indicators for describing a whole route planning and also in deciding within a pair of route plannings which member of the pair is a better route planning. Hence, our proposal also includes making affordable the work of the experts in this sense. The idea at this respect consists of reducing the initial problem from the route planning space to the route space. In fact, up to now experts get use of designing routes in local areas, one isolated from the rest, so this simplification clearly makes the task more affordable for them. Then, pairs of routes rather than pairs of route plannings are presented to the experts, so they must decide which route is better in each pair of routes rather than decide which planning is better in each pair of route plannings. Also, our proposal establishes several key performance indicators to describe a route rather than a route planning taking into account the above-mentioned issues and factors about the waste collection process and the information coming from the experts. The indicators will be strategically designed in order to satisfy the additivity property. This feature makes them be suitable to further extend the solution from the route space to the route planning space.

The discussion of the key performance indicators established is detailed in Section 3.1. One drawback arisen in designing these indicators is that travel time is not an available data, despite the experts state that it is a key issue to take into account. This is one of the obstacles that have been arisen in the design of key performance indicators. This work overcomes this drawback proposing a regression procedure in order to estimate the travel time distance from other available information. Up to now, only current local routes designed by the experts are available, but including the problem in a preference learning environment requires to have the disposal of other routes to built pairs from which experts must decide the member of each pair they prefer. In this sense, it arises the necessity of creating alternative routes able to design adequate route pairs to successfully allow acquiring the expert knowledge. This is another obstacle that has been arisen in the research of this paper. Our proposal copes with this drawback designing a greedy algorithm to create alternative routes for this purpose. The details are explained in Section 3.2. Finally, building a fitness function for evaluating a route through a preference learning process is exposed in Section 3.3.

3.1 Designing key performance indicators (KPIs) for a route

The KPIs for building a fitness function to assess the quality of a route are designed and exposed in this section. In some way, they are established in an attempt of gathering the most relevant factors that the collection experts consider as relevant when they need to design a waste collection route.
In the following, the KPIs are discussed and designed according to the requirements of the experts. Let us notice that most of the factors the experts consider are related to geographic properties, then, a geographic information system will be required to compute the values of most of the KPIs.

3.1.1 Travel distance

The travel distance is one of the basic and straightforward KPI chosen by the experts. This measure is directly related to the fuel consumption and truck wear. In order to get its value, let consider a route whose initial and final garage is $g$ (let us remind that a requirement of the company is that each truck starts and ends in the same garage) and let be $C$ a set of waste collection points $\{p_{1},p_{2},...,p_{n}\}$ predefined by councils. Then, the travel distance ( $D$ ) is obtained from several other smaller travel distances in the map, namely:

i)

Distance from the garage $g$ of the current route to the first waste collection point $p_{1}$ on the route, $D(g,p_{1})$ , and from the last waste collection point $p_{n}$ to the garage $g$ , $D(g,p_{n})$ .
ii)

Distance between waste collection points that is computed as the sum of each distance between the current waste collection point $p_{i}$ and the next waste collection point $p_{i+1}$ .

$D(p_{1},p_{n})=\sum_{i=1}^{n-1}D(p_{i},p_{i+1})$ (2)

Hence, the travel distance of a route is the sum of all these distances.

D(g,g)=D(g,p_{1})+D(p_{1},p_{n})+D(p_{n},g)

(3)

3.1.2 Travel time

The travel time is one of the crucial measure to take into account according to the experts and to other works [8, 26]. Besides, experts justify it due to the existing timetable collection constraints and the maximum collection worker and driver shift duration. However, this measure is not easy to compute, as it seems. Even more, it is impossible to compute from the actual available information. In fact, only the time employed for the routes locally designed by the experts is at our disposal, so a mechanism to compute the travel time for a generic route must be established. The proposal to overcome this hurdle is to induce a regression function from the local information available through a learning process, for instance, as in case of [26]. Regression has been widely used in real applications as in case of prediction solar radiation over a meteorological station [21]. In this process and given that a route is split into three stretches (see the discussion of travel distance), the route time computation is also split into the same parts in order to facilitate the learning process:

i)

Two time periods concern to the time spent in driving from the garage to the first waste collection point and from the last waste collection point to the garage. Each travel of both kinds is taken as a point for the regression process and both the travel distance and the travel time of the existing local routes are annotated. Then, a regression function $T_{g}$ is deduced for estimating the time spent in the two route stretches in which the garage is involved.
ii)

The third time period regards to the time spent between collection points one each other. In this case, the distance between the first and the last collection point, the number of collection points and the travel time of the existing local routes are annotated to feed the regression system. Hence, another regression function $T_{p}$ is induced for predicting the travel time spent between collection points.

Once both regression functions $T_{g}$ and $T_{p}$ are available, the prediction time spent in a certain route will be the sum of three parts: i) the one from the garage $g$ to the first collection point $p_{1}$ , for which $T_{g}$ is applied, ii) the one from the first collection point $p_{1}$ to the last collection point $p_{n}$ for which $T_{p}$ is applied and finally iii) the one from the last collection point $p_{n}$ to the garage $g$ , for which $T_{g}$ is applied again. Therefore, the total travel time of a route is the sum of all these time periods:

T(g,g)=T_{g}(g,p_{1})+T_{p}(p_{1},p_{n})+T_{g}(p_{n},g)

(4)

Notice that it is possible that some of the predicted time periods may be greater than the actual ones. Hence, a correction factor will be applied for any route for which the regression functions are applied. This correction factor can only be computed from the available routes designed locally by the expert. For this purpose, the differences between real and predicted values are evaluated and the resulting sample is supposed to follow a normal distribution. The correction value will be the value for which the difference is positive at a certain significant level.

3.1.3 Accumulated altitude

Another relevant issue is the orography since a steep terrain complicates the access of the trucks to the collection points. Besides, it also affects the collection order, because it is obviously more costly for the trucks going uphill with a fuller load than going downhill with the same load, not only due to keeping the good condition of the trucks, otherwise also due to not rocket fuel consumption. One straightforward KPI in order to take into account this issue may be the accumulated altitude. One can also think of considering the slope percentage, but the experts reject this option, since it does not distinguish the best option between going uphill a long stretch with lower slope and going uphill a short stretch with higher slope. Then, and from a set of points of the route, $q_{1},...,q_{m}$ ²²2These points can be collection points or not. In fact, they include the collection points and in addition quite more tracking points in order to get a better altitude estimation., their respective altitudes $A(q_{1}),...,A(q_{m})$ are obtained and the accumulative altitude KPI for a route that begins at a garage $g$ and ends at the same garage is evaluated as follows

A(g,g)=\sum_{i=1}^{m-1}\max(0;A(q_{i+1})-A(q_{i}))

(5)

3.1.4 Road type

The type of road is also a quite relevant factor, since driving along a motorway highly differs from driving along local roads. This fact is not only because of fuel consumption, otherwise it is also because of taking into account the driver comfort and the travel time spent. According to the approval of the experts, the road types are classified according to their relevance (from major to minor) as motorway, dual carriageway, regional roads, local roads and others (square, street, path, avenue….). Then, and in order to consider the road type factor, three different KPIs are considered, namely, i) secondary road distance ( $D_{s}$ ), which is the percentage of kilometres the truck travels along secondary roads (square, street, path, avenue and local roads) and ii) secondary and regional road distance ( $D_{r}$ ), which is the percentage of kilometres the truck travels along secondary and regional roads. Hence, $D_{s}(g,g)$ and $D_{r}(g,g)$ are computed for a route (notice that the travel distance along main roads is the difference between the whole travel distance $D(g,g)$ and the sum of $D_{s}(g,g)$ and $D_{r}(g,g)$ and is not considered as a KPI, since it is a linear combination of the other KPIs, and hence it is redundant).

3.1.5 Truck load

Another aspect that experts reveal great interest on is the truck load. They statistically prove that driving excessive kilometres with a heavy load highly affects the fuel consumption and truck wear. In this direction, it is preferable to firstly drive to the farthest collection point of the route with the truck unload and to secondly return collecting the waste than taking the waste on the outbound journey, which means to carry with the whole waste loaded in the truck during all the return journey at least.
Both fuel consumption and truck wear are almost impossible to estimate, then the truck load effect is not easy to summarize in a straightforward KPI. In fact, the KPI taken and able to compute that can represent the load truck will be a measure just related to fuel consumption (not even the fuel consumption itself) rather than the truck load itself. The proposal to establish a KPI for estimating this issue $E$ consists in measuring, in one sense, the effort of the truck to get the garage carrying the waste collected. For this purpose, let consider the distance from the current collection point $p_{i}$ to the garage $g$ , that is, $D(p_{i},g)$ , and let be $L(p_{i})$ the amount of waste the truck collects at the current point. Then, the product of them

E(p_{i})=L(p_{i})\cdot D(p_{i},g)

(6)

can be an estimation of the effort of the truck to get the garage from the current waste collection point carrying the waste collected at this point. Hence, computing the sum of all $E(p_{i})$ for all the collection points leads to a measure proportionally related to the fuel consumption and will define the KPI

E(g,g)=\sum_{i=1}^{n}E(p_{i})=\sum_{i=1}^{n}L(p_{i})\cdot D(p_{i},g)

(7)

A setting in order to effectively compare routes is to compute the average rather than just the sum, since the routes have, in general, different number of collection points. Hence, the KPI will be

E(g,g)=\frac{\sum_{i=1}^{n}E(p_{i})}{n}=\frac{\sum_{i=1}^{n}L(p_{i})\cdot D(p_{i},g)}{n}

(8)

Furthermore, and unfortunately, neither the collected waste volume nor the collected waste weight is available data. Hence, one can think of estimating it, but the experts discarded this option, since they argued that it is highly probable that the noise that may be added in the estimation may not compensate the possible improvement obtained in exchange of taking a more accurate KPI. Consequently, the expert recommendation was to establish $L(p_{i})=1$ for all the collection points and then to simplify the computation of the KPI for the current collection point as the remaining distance from the current point to the garage, that is,

E(g,g)=\frac{\sum_{i=1}^{n}E(p_{i})}{n}=\frac{\sum_{i=1}^{n}D(p_{i},g)}{n}

(9)

As a final remark, we mention that the computation of this KPI assumes that the truck carries the waste until the garage, but this actually does not happen because the truck really empties the waste collected in a transfer station before getting the garage. Including this issue in the KPI highly complicates its computation, since, depending on the real last collection point, the transfer station may differ.

3.1.6 Final considerations

As mentioned before, each route was split into three parts in order to compare the route pairs:

i)

First stretch: from the garage to the first collection point
ii)

Second stretch: between collection points
iii)

Third stretch: from the last collection point to the garage

Travel distance ( $D$ ), travel time ( $T$ ), accumulated altitude ( $A$ ) and road type ( $D_{s}$ and $D_{r}$ ) are computed not only for the whole route, otherwise, also for these three route stretches. However, only the KPI of the truck load $E$ is computed for the whole route. This means that a total of 21 KPIs are computed for each route. Table 1 summarizes all the KPIs computed.

Table 1: The 21 KPIs computed for a route

	Travel	Travel	Accumulated	Secondary	Secondary and	Truck
Stretch\KPI	distance	time	altitude	road distance	regional road distance	load
1st stretch	$D_{1}$	$T_{1}$	$A_{1}$	$D_{s,1}$	$D_{r,1}$	-
2nd stretch	$D_{2}$	$T_{2}$	$A_{2}$	$D_{s,2}$	$D_{r,2}$	-
3rd stretch	$D_{3}$	$T_{3}$	$A_{3}$	$D_{s,3}$	$D_{r,3}$	-
Complete route	$D$	$T$	$A$	$D_{s}$	$D_{r}$	$E$

3.2 Providing alternative routes

This section deals with the building of alternative routes for obtaining route pairs needed for being used in a preference learning framework that will provide a route fitness function. Our proposal consists of designing a greedy algorithm that builds adequate alternative routes. For each of the 64 existing routes, the coordinates of the garage and containers and the time limit for the route due to the workers and truck limit time is the information needed to feed the algorithm. According to the experts, the map distance $D$ is one of the key factors that makes great influence on the waste collection process cost and it is clue for the planning optimization. Since it is not available another fitness function that joins and combines the main factors exposed before (in fact, it is the goal of the paper), the map distance $D$ will be chosen as fitness measure to drive the greedy algorithm.
For each existing route, the pseudocode of this process is shown in Algorithm 1. This algorithm starts with the truck assigned to this route leaving the garage to the nearest container of this route that has not been picked up. If adding this new container to the alternative route involves that the time of this alternative route is smaller than the workers and truck limit time, then, this container is added to that alternative route. In contrast, if it is higher, the algorithm finished and it is supposed that the truck goes back to its garage and the alternative route ends.

Algorithm 1 Algorithm for alternative waste collection routes

garage

P

garage

is the garage where the route starts and ends.

P

is the set of all waste collection points.

2:Route

R

with all points

list\_points\leftarrow P

\triangleright

Init list with all the points of the route.

R.add(garage)

\triangleright

Add initial

garage

p_{0}\leftarrow select\_next\_point(garage,P)

\triangleright

Search next point from

garage

R.add(p_{0})

list\_points.pop(p_{0})

\triangleright

Point

p_{0}

is already selected

i=0

9:while

len(list\_points)>0

10:

p_{i+1}\leftarrow select\_next\_point(p_{i},list\_points)

11:

R.add(p_{i+1})

\triangleright

Add current point to

R

12:

list\_points.pop(p_{i+1})

13:

i=i+1

14:

R.add(garage)

\triangleright

Add final

garage

15:return

R

Let us now detail the way the closest container is taken each time. In order to find the truck with the smallest map distance to some container that has not been picked up yet, it is required to iterate for each pair of truck and container, and in addition, the external map service must be called at each iteration. This task is quite costly since calling the external map service would mean waiting a long time to get the answer. In addition, a very high number of requests would have to be made, which could cause the server to saturate. So it is quite interesting to find out a way of optimizing it. The proposal for performing an optimization for this process consists of considering the euclidean distance together with the map distance. Hence, the pseudocode of this optimization is shown in Algorithm 2.

Algorithm 2 Algorithm for obtaining the nearest map distance waste collection point

p_{i}

P

p_{i}

is the current waste collection point.

P

is the set of all waste collection points.

2:Next point

p_{i+1}

from

p_{i}

list\_closest\_points\leftarrow[]

\triangleright

Init empty list to add next possible points.

p_{ednearest}\leftarrow min\_euclidean\_distance(p_{i},P)

\triangleright

Nearest point of

P_{i}

using euclidean distance (

p_{ednearest}

)

mapdistance\leftarrow map\_distance(p_{i},p_{ednearest})

\triangleright

Get maximum limit of map distance

6:for each point in

P

7: if

euclidean\_distance(p_{i},point)<=mapdistance

then

list\_closest\_points.add(point)

\triangleright

Add it to closest points.

9:return

p_{i+1}\leftarrow min\_map\_distance(list\_closest\_points)

This way of obtaining the closest collection point allows searching among a limited number of neighbour points instead of performing a costly exhaustive search over the whole number of points that have not been collected yet. Besides, this process also guarantees to obtain the closest collection point because the Euclidean distance is always lower than or equal to the map distance. In this way, the optimum point is always obtained which leads to a notably reduction in the number of requests to be made to the external map service.

3.3 Obtaining a model for ranking routes

This section deals with the process of building a fitness function to assess the quality of a route through a preference learning framework.

Once alternative routes are built, the KPIs exposed in Section 3.1 are computed for both the existing routes locally optimized by the expert and the alternative routes built from them in Section 3.2. The KPI values will conform the route description $x=(D,T,A,D_{s},D_{r},E)$ . Let us notice that all those KPIs are better as lower, since it is preferable a route $x$ with lower travel distance $D$ , lower travel time $T$ , lower accumulated altitude $A$ , lower travel distance through secondary roads $D_{s}$ , lower travel distance through secondary and regional roads $D_{r}$ and lower truck load $E$ than another route $y$ with higher KPI values. Then, the existing 64 routes can be split into three groups:

i)

The routes of the expert $x$ whose KPI values are all better or equal than the counterpart alternative route $y$ .
ii)

The alternative routes $y$ whose KPI values are all better or equal than the counterpart of the expert $x$ .
iii)

The routes $x$ with some KPI values better for the expert and the rest KPI values better for the alternative routes $y$ .

Among those 64 routes, just one route falls into the one that satisfies i) and just another one satisfies ii), whereas the rest of the routes belong to the iii) group. The pairs satisfying i) and ii) are directly considered to feed the preference learning algorithm, since the preference of the route is clear (the expert route in case of i) ( $x\succ y$ ) and the alternative route in case of ii) ( $y\succ x$ ). However, the preference for the routes of iii) is undefined and it is neither trivial nor easy to automatically provide a good decision. This is the reason why the experts were asked for deciding the best route in the case of this kind of pairs. The information provided to the experts to establish the preference judgments were just the routes placed on a map and not the KPIs that conform the route description. They analysed the routes on the map and using their knowledge and expertise they revealed the preference judgment between the routes of each pair.

Once the pairs of routes are available with both the route description using the KPIs and the preference member established, the learning process takes place. As stated in Section 3, the preference learning consists in obtaining a preference or ranking function $f$ defined in the route planning space such that $f(x)$ is greater than $f(y)$ whenever $x$ is preferred to $y$ . Let remember that $S=\{(x,y):f(x)\geq f(y)\}$ is the set of preference judgments, hence,

x\succ y\Rightarrow f(x)>f(y)\Rightarrow f(x)-f(y)>0

(10)

and if $f$ is linear, then:

f(x)-f(y)=f(x-y)

(11)

and therefore

x\succ y\Rightarrow f(x-y)>0

(12)

Hence, the ranking function $f$ can be learned using a binary classification algorithm able to separate the class according to the sign returned. Consequently, once the KPIs are calculated, each pair of routes is transformed into a pair of examples, where the classes are +1 and -1. If $x\succ y$ , the pair of examples are

\displaystyle T=\{(x-y;+1),(y-x;-1):(x,y)\in S\}

(13)

The linear function induced by the learning process $f$ passes through the origin of coordinates and is thus defined by

f(z)=<w,z>=\sum_{i=1}^{21}w_{i}z_{j}

(14)

where $w$ is the weight vector and $<w,z>$ is the scalar product of $w$ and $z$ . The score $f(z)$ will be the assessment of $z$ in the sense that $f(z)$ will be taken to predict preferences between the route $z$ and other routes. Also, $w$ will be the director vector of the assessment hyperplane.

Let us notice that it is a good choice to assume that $f$ is linear, since, as commented before, the values of the KPIs are better (preferable) as lower, that is, they are monotone. Under this paradigm, the problem can be solved using a binary classification algorithm [16]. However, not all binary classification algorithms are suitable in this context. Firstly, the algorithm must obtain a linear model. Secondly, the algorithm must be one among those that produces a numerical value. This numerical value is commonly taken afterwards for performing the classification into $+1$ or $-1$ classes. In fact, this value is typically higher as the algorithm is more reliable of an instance to belong to class $+1$ and it is this property what makes this value be suitable for providing a ranking. Hence, the afterwards classification will be ignored in this preference learning framework. Two promising options have been found that fit these criteria, namely, Support Vector Machines (SVM) [27, 32] with linear kernel and Logistic Regression (LR) [10, 17, 24]. Both algorithms have been widely and recently used in real applications, for instance, for financial time series forecasting [1] or for predicting proportions of complex blends in food products [6] in case of SVM and for aircraft engine degradation prognostics [25] or for outdoor thermal sensation comfort range [33] in case of LR. Although there exist much more algorithms for binary classification, as $k$ -nearest neighbour, neural networks or decision trees, they do not provide a linear model or they do not produce a numerical value related to the classification.

3.4 Assessing KPIs through feature selection

This section performs a study about the quality of the KPIs in the performance of route ranking. Although the KPIs have been carefully and exhaustively designed under the criteria of the experts, let us go in depth and analyse if effectively all of them are necessary and suitable to obtain a promising route ranking. In fact, the experts have serious doubts about their influence. For this purpose, we have checked several ways of selecting KPIs. The experts proposed the first one. They were unsure about if the KPIs must be taking globally for the whole route or by stretches depending if the stretch involves collection points or not. Also, they had special interest in checking the load truck influence. In this sense, they have proffered the following groups of KPI features to analyse:

i)

Whole route KPIs: $D$ , $T$ , $A$ , $D_{s}$ and $D_{r}$ .
ii)

Whole route KPIs and truck load: $D$ , $T$ , $A$ , $D_{s}$ , $D_{r}$ and $E$ .
iii)

KPIs by stretches: $D_{i}$ , $T_{i}$ , $A_{i}$ , $D_{s,i}$ and $D_{r,i}$ for $i=1,2,3$ .
iv)

KPIs by stretches and truck load: $D_{i}$ , $T_{i}$ , $A_{i}$ , $D_{s,i}$ , $D_{r,i}$ for $i=1,2,3$ and $E$ .
v)

KPIs by stretches and whole route KPIs: $D$ , $T$ , $A$ , $D_{s}$ , $D_{r}$ and $D_{i}$ , $T_{i}$ , $A_{i}$ , $D_{s,i}$ and $D_{r,i}$ for $i=1,2,3$ .
vi)

KPIs by stretches, whole route KPIs and truck load: $D$ , $T$ , $A$ , $D_{s}$ , $D_{r}$ and $D_{i}$ , $T_{i}$ , $A_{i}$ , $D_{s,i}$ , $D_{r,i}$ for $i=1,2,3$ and $E$ .

Automatic feature selection was also adopted in order to compare with the groups of features taken by the experts. Feature selection is a widely practice [20] to improve the performance of the machine learning techniques, since it removes redundancy and noise included in the data. The idea here is to take an adequate feature selection technique suitable for algorithms that produce linear models and that get good performance in case of few features. A promising technique for this purpose is Recursive Feature Elimination (RFE) [14]. This method is a greedy algorithm that repeatedly builds a model removing the worst feature according to assign weights to features, commonly the coefficients of a linear model. It is considered a wrapper, since a machine learning algorithm is used in the core of the method to build the model. Then, it successively repeats the process with the remaining features until the desired number of features to select is eventually reached, although the algorithm is typically tuned for wording until no more features are available. Finally, the features are ranked in the same order they have been removed. The pseudocode of the method is shown in Algorithm 3.

Algorithm 3 Recursive Feature Elimination (RFE) algorithm

Data

: Dataset T (see Equation (13)) with preference judgments that contains

v^{*}

2:Ranked list of indicators

(z_{1},...,z_{v^{*}})

order by their relevance.

tuning\_parameters(model)

train(model)

v\leftarrow v^{*}

6:while

v\geq 2

model_{v}\leftarrow model

with the optimized tuning parameters for

v

and

Data

w_{v}\leftarrow calculate\_weight\_vector(model_{v})

\triangleright

(w_{v1},...,w_{vv})

rank\_criteria\leftarrow(w_{v1}^{2},...,w_{vv}^{2})

10:

min\_rank\_criteria\leftarrow min(rank\_criteria)

\triangleright

Lowest value

11:

Remove(min\_rank\_criteria,Data)

\triangleright

Remove from

Data

12:

z_{v}\leftarrow min\_rank\_criteria

13:

v\leftarrow v-1

14:

z_{1}\leftarrow

variable in

Data\notin(z_{2},...,z_{v^{*}})

15:return

(z_{1},...,z_{v^{*}})

The strategy that follows this algorithm makes it be of linear order with regard to the number of features, in contrast to other algorithms, as Backwards Elimination (BE) or Forward Selection (FS), which are of quadratic order with regard to the number of features. The RFE method requires to be used together with algorithms that produce linear models, which is a condition above-established in the preference learning framework and the main reason why SVM and LR are chosen. Besides, RFE has been successfully and recently applied to other real applications, as for instance, for Alzheimer’s disease diagnosis [30], for credit card fraud detection [31] or for heat-resistant steel running state evaluation [18].

4 Experiments

This section deals with the experiments carried out in this work, but before establishing the parameter settings and displaying and discussing the experiment results, let us state the geographic information systems for obtaining the KPIs (see Section 4.1). Concerning the experiments, two different types are exposed, namely, i) experiments to estimate the travel time KPI through a regression process (see Section 4.2) and ii) experiments through a preference learning process to provide the fitness function to evaluate the quality of a waste collection route that, as it was previously mentioned, it will be also taken to evaluate the quality of a waste collection planning (see Section 4.3); this last kind of experiments also includes the feature selection process.

4.1 Geographic information systems for obtaining the KPIs

The KPIs are computed using a geographic information system called Open Source Routing Machine (OSRM)³³3https://map.project-osrm.org/, which is based on OpenStreetMap (OSM)⁴⁴4http://www.openstreetmap.org. There are a variety of applications or projects that have been developed using OSM⁵⁵5Thunderforest project (http://www.thunderforest.com/), the Open Topo Map (https://opentopomap.org/), the MTBmap (https://openmtbmap.org/es/) or the Waymarked Trails project (http://www.waymarkedtrails.org/) and OSRM⁶⁶6Cycle.Travel (http://cycle.travel/), I Bike Cph (https://www.ibikecph.dk/en) and MAPS.ME (https://maps.me/).
OSMR does not include altitude data and the information may be quite incomplete in low population density areas, since it depends on the response of users. Despite that, there are several advantages of OSRM that makes it be a good choice, namely, it is free, it is possible to find very extensive documentation, it is optimized for the transport mode used to move between locations and it allows to choose the type of truck to carry out the routes.
Other alternatives to OSMR are Google Maps⁷⁷7(https://www.google.com/maps) that incorporates elevation data and very complete information, GraphHopper⁸⁸8(https://www.graphhopper.com/) that has a route optimization API and elevation data or Mapbox⁹⁹9(https://www.mapbox.com/) that is equipped with a traffic-oriented routing module or Mapzen Valhalla¹⁰¹⁰10(https://mapzen.com/) that admits trips by car, on foot, by bicycle and by public transport, calculates the most efficient way to visit multiple destinations and has elevation data even for bicycle routes. However, all of them present the disadvantage of being not free for huge amount of request to their servers, something that is crucial for us.
The alternative to provide altitude data needed to compute the accumulative altitude $A$ that we propose to use in this paper is the global elevation model called GMTED2010¹¹¹¹11https://lta.cr.usgs.gov/GMTED2010, since the number of requests is not limited and it is able to cope with waste collection vehicles. This software was developed by the U.S. Geological Survey (USGS) and the National Geospatial-Intelligence Agency (NGA). In this sense, OSRM provides a set of points of a stretch and, then, the GMTED returns the altitude of these points.

Table 2: RMSE and RMSE SD for

T_{g}

and

T_{p}

regression models

	$T_{g}$ model		$T_{p}$ model
Algorithm	RMSE	RMSE SD	RMSE	RMSE SD
LiR	489.65	82.18	525.72	163.41
BC	530.91	83.69	693.10	285.93
BT	498.67	75.34	665.98	273.68
RF	542.87	78.55	521.35	156.71
BRR	489.50	83.00	520.70	167.14

4.2 Travel time estimation

Two models have been created to estimate the time spent by a truck during the route. The first model will estimate the time employed by the truck in driving from a garage to the first waste collection point that will be the same that will estimate the time employed from the last waste collection point to the garage. The second model will estimate the time employed between containers.
Both models were created using the library Caret4 (short for classification and regression training) in R. This library includes several algorithms for the regression and classification tasks. The use of one algorithm or another might influence the purpose of this project, changing the routes and their times. Consequently, we perform an exhaustive study in order to the get good travel time estimation. Particularly, we carried out experiments using linear regression (LR), bagged CART (BC), Boosted Tree (BT), Random Forest (RF) and Bayesian Ridge Regression (BRR) methods.
For all these methods, a cross validation was performed with 3, 5 or 10 folds with 3 repetitions. Only BT and RF require parameters that were tuned and decided to be $maxdepth=2$ and $mstop=50$ for BT and $mtry=1$ for RF.
Table 2 displays the root mean square error (RMSE) and its standard deviation (RMSE SD) for the above-mentioned methods for estimating the travel time $T_{g}$ (from garage to waste collection point and vice versa) and $T_{p}$ (between waste collection points). The method that offers the best performance for both $T_{g}$ and $T_{p}$ is BRR, and it will be the method we will choose to estimate the travel time. However, LiR might be also a good alternative for estimating both $T_{g}$ and $T_{p}$ taking into account that i) the RMSE of LiR is slightly lower than for BRR, ii) the RMSE SD values of LiR are quite similar to that of BRR and iii) the magnitude of RMSE SD with regard to the RMSE values. The same happens for BT, but only in case of estimating $T_{g}$ , and for RF, but only in case of estimating $T_{p}$ . It is quite remarkable that the RMSE SD values in case of estimating $T_{g}$ are quite lower than those values in case of estimating $T_{p}$ , maybe because the casuistry is clearly more complex in case of the travels between collection points than in case of the garage to the first collection point and of the last collection point to the garage.

Refer to caption — Figure 1: $C-$ index for expert feature selection, parametric RFE and automatic RFE when SVM is taken as an algorithm that produces linear models

4.3 Fitness function for ranking routes

Before presenting and discussing the results of the fitness function design, let us discuss the experimental settings, the methods for both providing a ranking and feature selection and the performance measure employed.

4.3.1 Experimental settings

SVM [27, 32] with linear kernel and and LR [10, 17, 24] were chosen to provide a fitness function for ranking routes, since, as mentioned in Section 3.3, they are promising algorithms that produce linear models and that provide a numerical value to make a decision about the class value that are suitable in the preference learning context.

A comparison with other approaches discussed in Section 1 was performed. On one hand, experiments were carried out just considering a unique KPI [8, 26]. But, our study goes further and not only travel time ( $T$ ) and travel distance ( $D$ ) are considered isolated from the rest, otherwise experiments were also performed for the accumulated altitude ( $A$ ), road type ( $D_{s}$ and $D_{r}$ ) and truck load ( $E$ ). On the other hand, and following the work [23] that combines key performance indicators from mobile networks, experiments with the optimization algorithm called particle swarm optimization [4] were also carried out. Each particle is represented by $21$ values, which correspond to the weights of the $21$ KPIs. The fitness function considered was the mean $C-$ index of the ranks of each preference judgment. The parameter values were set according to [23]. Particularly, the number of iterations was fixed to $1000$ and the number of particles $10$ , $50$ , $100$ and $1000$ .

Concerning feature selection, in addition to the manual feature selection provided by the experts, the proposal was to use RFE [14] (see Section 3.4). This method has been checked using two different manners. The first modality (that we will call parametric RFE) consists of setting beforehand the desired number of features to select, which ranges from 1 to 21 whereas the second modality (that we will call automatic RFE) involves letting the method to provide the best set of features.

Table 3:

C-

index when just a unique KPI is taken for fitness function

$D$	$T$	$A$	$D_{s}$	$D_{r}$	$E$
57.81	60.94	50.00	53.13	54.69	71.88

Table 4:

C-

index for particle swarm optimization for different number of particles

Particles	10	50	100	1000
$C-$ index	81.51	86.57	87.11	89.65

Table 5: Features selected by the parametric and automatic RFE when SVM and LR is taken as an algorithm that produces linear models

Parametric RFE
SVM ( $6$ features)	LR ( $7$ features)
truck load ( $E$ )	truck load ( $E$ )
altitude ( $A$ )	1st stretch distance ( $D_{1}$ )
1st stretch distance ( $D_{1}$ )	2nd stretch distance ( $D_{2}$ )
2nd stretch distance ( $D_{2}$ )	3rd stretch distance ( $D_{3}$ )
2nd stretch secondary and regional road distance ( $D_{r,2}$ )	3rd stretch time ( $T_{3}$ )
3rd stretch altitude ( $A_{3}$ )	3rd stretch altitude ( $A_{3}$ )
	3rd stretch secondary and regional road distance ( $D_{r,3}$ )
Automatic RFE
SVM ( $6$ features)	LR ( $8$ features)
truck load ( $E$ )	truck load( $E$ )
1st stretch distance ( $D_{1}$ )	2nd stretch secondary and regional road distance ( $D_{r,2}$ )
1st stretch secondary and regional road distance ( $D_{r,1}$ )	altitude ( $A$ )
1st stretch time ( $T_{1}$ )	secondary and regional road distance ( $D_{r}$ )
2nd stretch secondary and regional road distance ( $D_{r,2}$ )	3rd stretch time ( $T_{3}$ )
3rd stretch secondary road distance ( $D_{s,3}$ )	3rd stretch secondary road distance ( $D_{s,3}$ )
	2nd stretch distance ( $D_{2}$ )
	1st stretch distance ( $D_{1}$ )

The $C-$ index is the evaluation measure taken. It is an estimator of the concordance probability [15] typically used in statistics and equivalent to the pairwise ranking error for ordinal regression [16] that has been commonly used in multipartite ranking as an estimation of the probability that a randomly chosen pair of instances from different classes is ranked correctly [11] as a straight-forward generalization of AUC measure. The $C-$ index was estimated though a cross-validation of 10 folds and 10 repetitions.

4.3.2 Experimental results

This section deals with the results reached in the experiments. Firstly, the results from the existing approaches are discussed. Then, the results provided by the proposed approach are exhaustively analysed from different points of view.

Tables 3 and 4 display the results of the existing approaches. Particularly, Table 3 shows the $C-$ index when just a unique KPI is taken as fitness function, whereas Table 4 displays the $C-$ index for the particle swarm optimization algorithm with different number of particles. The conclusions are the following:

i)

Between distance $D$ and time $T$ , one can observe that the fitness function consisting just of the time indicator produces the best performance. This results agrees with the conclusions reached in other approaches [8, 26]. However, the truck load $E$ yields the best performance. So far, and to best of our knowledge, none work has considered the truck load as an indicator, in spite of experts in the field highly expected to make great influence on the route planning assessment.
ii)

The particle swarm optimization provides quite best performance with regard to the method that just takes a unique KPI. This fact confirms the hypothesis that combining several issues for providing a fitness function improve the performance of the route planning assessment. The performances improve as the number of particles increases, but in any case it does not reach the $90\%$ of $C-$ index.

Figures 1 and 2 respectively display the $C-$ index when SVM and LR is taken for providing a linear model together with expert feature selection, parametric RFE and automatic RFE. Clearly, fitting the problem into a preference framework overtakes the other existing approaches in the literature.

Focusing now exclusively on the method proposed in this paper, one can extract the following conclusions from those figures:

i)

The load truck is a crucial KPI to rank the routes (see the diamonds when $6$ versus $5$ features are selected and the same when $15$ versus $16$ features are selected). This is so, since the performance clearly improves both in the case of taking the whole route KPIs and in case of taking the KPIs split in stretches, especially in the former case.
ii)

It seems that there are hardly differences between taking the KPIs for the whole route and taking the KPIs by stretches (compare the diamonds at $6$ features versus $16$ features).
iii)

Automatic RFE gives the overall best performance (see the cross in both figures), whose performance is around $98\%$ .
iv)

There clearly exists a high redundancy among the KPIs, as experts expected. This is so because both parametric and automatic RFE reaches the best performance when few features are selected. Particularly, the parametric RFE offers the best results when it takes $6$ (for SMV) or $7$ (for LR) out of $21$ features and automatic RFE does it taking around (an average along folds and repetitions) $4$ of the $21$ features.

Let now analyse the more promising KPIs, that is, the KPIs that yield the best performance. Table 5 shows the features selected by the parametric RFE when they reach the best performance and also by the automatic RFE. The main conclusions of this analysis are:

i)

In regard to parametric RFE, the selected features differ if one uses SVM or LR. They select $4$ common features (bolded in Table 5) out of $6$ for SVM and out of $7$ for LR, which is respectively about the $66\%$ and $57\%$ of the features selected. These common features are truck load ( $E$ ), the first and second stretch distances ( $D_{1}$ and $D_{2}$ ) and the third stretch altitude ( $A_{3}$ ).
ii)

Concerning automatic RFE, which reaches the overall best performance, it selects $6$ out of $21$ features using SVM to get the linear model, whereas it selects $8$ out of $21$ when it uses LR. They select $4$ features in common (bolded in Table 5), namely truck load ( $E$ ), the first stretch distance ( $D_{1}$ ), the second stretch secondary and regional road distance ( $D_{r,2}$ ) and the third stretch secondary distance ( $D_{s,3}$ ). These common features respectively are about the $66\%$ for SVM and $50\%$ for LR of the features selected.
iii)

As seen, the truck load ( $E$ ) is a clue feature in this process, as it was expected. Also, the distance in the three stretches are highly relevant, although depending on the stretch, this distance is taken for the whole stretch (in case of the first stretch), only for secondary and regional part of the stretch (in case of the second stretch) and for the secondary road stretch (in case of the third stretch).
iv)

Another conclusion is that splitting the route in stretches seems to be a good choice, since the method does not tend to select features of the whole route. In fact, none whole route feature is taken in case of SVM.

Figures 3 and 4 respectively display a heat map that represents the frequency whit which each feature is selected by parametric (in the middle) and automatic (at the bottom) RFE in the cross-validation folds (taking also into account the repetitions) of the cross validation. They also include the manual selection proposed by the experts as a baseline reference. The following issues summarizes the conclusions:

i)

At sight, the most frequently feature is clearly the truck load ( $E$ ). Other features of the whole router that use to appear are distance ( $D$ ), altitude ( $A$ ) and secondary and regional road distance ( $D_{r}$ ).
ii)

Focusing on stretches, distance ( $D_{1}$ ), time ( $T_{1}$ ) and secondary and regional road distance ( $D_{s,1}$ ) predominate in the first stretch, distance ( $D_{2}$ ) and secondary and regional road distance ( $D_{r,2}$ ) do so in the second stretch and time ( $T_{3}$ ) and secondary road distance ( $D_{s,3}$ ) do so in the third stretch.
iii)
Viewing the map horizontally, one can find out redundancy in the features. Let see, for instance, two cases as an illustration:
1. (a)
  
  Case of parametric RFE when it selects $2$ features. In this case, second stretch distance ( $D_{2}$ ) and second stretch secondary and regional road distance ( $D_{r,2}$ ) seems to appear as alternative one from another together with the truck load. This fact is often repeated when more than $2$ features are selected.
2. (b)
  
  Case of $7$ features. In this case, about $11$ features used to be selected. Truck load ( $E$ ), second stretch distance ( $D_{2}$ ) and second stretch secondary and regional road distance ( $D_{r,2}$ ) seem to be almost always selected, whereas the $4$ remaining alternate one from each other. Besides, this behaviour is common both for SVM and LR.
Although this redundancy seemed to be latent and now it is visually displayed, it does not have caused surprise among the experts. They always were aware of existing relationships among the entire feature they proposed, although they recognize they are unknown.

5 Conclusions and future work

This paper takes the first step in establishing an automatic and global optimization on the waste collection process of the company called Consorcio para la Gestión de los Residuos Sólidos de Asturias (COGERSA). This step consists of designing a fitness function to assess a candidate route planning in order to use it in planning optimization algorithm for providing a global optimal route planning. So far, the experts manually designed the route planning focusing on particular local areas with a relative reduced number of waste containers whenever and under council demand. The route planning complexity due to the existing resources and constraints even makes the task of defining a fitness function not be affordable by the experts. This paper provides a fitness function from the expert knowledge and stating its design in a preference learning framework. The process is simplified to design a fitness function that assesses a route rather than a route planning again due to complexity. Fortunately, this simplification was easy to carry out, since expert did not mention specific indicators of a planning route, maybe because so far they planned routes one isolated from the rest. The idea under preference learning is to obtain expertise and knowledge from the experts from two different points of view. The first one is to discover the key indicators experts consider relevant in order to obtain a route description in form of features. The second one is to find out the preference judgment of the experts looking at the routes themselves, without taking into account explicitly the key indicators. Then, a ranking or preference function is induced from these information sources in an attempt to reproduce the expert decision knowledge, which will be the fitness function. For this purpose, this paper has exhaustively and carefully designed several key performance indicators according to the experts. These indicators were established for describing a route, but all of them have the additivity property, hence, they are straightforward extended to a route planning. Also, experts expect redundancy among such indicators, and for this reason, a feature selection analysis was included in this paper in order to check in what extent this hypothesis holds. The conclusion of the study confirms the existence of high redundancy among the indicators, since the best performance is reached when $6$ or $8$ out of $21$ indicators are taken. Truck load seems to be a clue indicator together with the distance travelled, especially if such distance belongs to no main roads. The proposed approach has been compared to other existing methods in the literature, that just consider a unique indicator as fitness function or combine them through a particle swarm optimization. The conclusion is that the method proposed in this paper clearly outperforms the existing approaches, since the performance goes from $72\%$ for a unique indicator fitness function or $90\%$ for particle swarm optimization to $98\%$ of the proposed preference learning approach.
As future work, the planning consists of including the fitness function in planning optimization algorithms. Besides, and given the existing redundancy among the key performance indicators, we plan to study more in depth those that have been shown to be clue in this study. For instance, truck load is one of the most relevant indicators. However, the way experts have found for computing it is quite gross. Hence, obtaining an alternative way of estimating it in a more accurate way may lead to an improvement. Also, the distance travelled along different kinds of roads seems to determine the quality of the route. In this sense, it may be promising to polish the split of the roads in different kinds, taking into account not only the official type of the route, otherwise the state, traffic and other environment issues. Finally, and since the original goal was to optimize a route planning it would be the great interest to consider key performance indicators of route planning, for instance, the number of routes, highly related with the human and material resources.

Acknowledgments

This research has been partially supported by the Spanish Ministerio de Ciencia e Innovación through the grant PID2019-110742RB-I00) and by the RDI project Smart Waste Collection (SWC), developed by the consortium COGERSA, SADIM S.A., S.M.E - grupo HUNOSA and ABAMobile and partially supported by the Instituto de Desarrollo Económico del Principado de Asturias (IDEPA) through the grants IDE/2015/000863, IDE/2015/000864 and IDE/2015/000865.

References

Altan and Karasu [2019] Altan, A., Karasu, S., 2019. The effect of kernel values in support vector machine to forecasting performance of financial time series and cognitıve decision making .
Bahamonde et al. [2004] Bahamonde, A., Bayón, G.F., Díez, J., Quevedo, J.R., Luaces, O., Del Coz, J.J., Alonso, J., Goyache, F., 2004. Feature subset selection for learning preferences: a case study, in: Proceedings of the twenty-first international conference on Machine learning, ACM. p. 7.
Bahamonde et al. [2007] Bahamonde, A., Díez, J., Quevedo, J.R., Luaces, O., del Coz, J.J., 2007. How to learn consumer preferences from the analysis of sensory data by means of support vector machines (svm). Trends in Food Science and Technology 18, 20 – 28. URL: http://www.sciencedirect.com/science/article/pii/S0924224406002433, doi:https://doi.org/10.1016/j.tifs.2006.07.014.
Blackwell et al. [2007] Blackwell, T.M., Kennedy, J., Poli, R., 2007. Particle swarm optimization. Swarm Intelligence 1, 33–57.
Buck et al. [2001] Buck, D., Wakeling, I., Greenhoff, K., Hasted, A., 2001. Predicting paired preferences from sensory data. Food Quality and Preference 12, 481–487. doi:10.1016/S0950-3293(01)00041-6.
Carrasco et al. [2018] Carrasco, B., Gil, A., Quevedo Pérez, J.R., Montañés Roces, E., Shen, G., Baeten, V., Fernández Pierna, J.A., 2018. Local vs. svm-based stacking approach for predicting proportions of complex blends in food products. XVII Chemometrics in Analytical Chemistry, Junio, 2018. Halifax, Nova Scotia,(Canadá) .
Cohen et al. [2011] Cohen, W., Schapire, R., Singer, Y., 2011. Learning to order things. Journal of Artificial Intelligence Research 10. doi:10.1613/jair.587.
Dezani et al. [2014] Dezani, H., Bassi, R.D., Marranghello, N., Gomes, L., Damiani, F., Da Silva, I.N., 2014. Optimizing urban traffic flow using genetic algorithm with petri net analysis as fitness function. Neurocomputing 124, 162–167.
Díez et al. [2004] Díez, J., Fernandez Bayon, G., Quevedo, J., del Coz, J., Luaces, O., Alonso, J., Bahamonde, A., 2004. Discovering relevancies in very difficult regression problems: Applications to sensory data analysis., pp. 993–994.
Fan et al. [2008] Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J., 2008. Liblinear: A library for large linear classification. Journal of Machine Learning Research 9.
Fürnkranz et al. [2009] Fürnkranz, J., Hüllermeier, E., Vanderlooy, S., 2009. Binary decomposition methods for multipartite ranking, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer. pp. 359–374.
Garc´ıa-Peñalvo et al. [2018] García-Peñalvo, F.J., Cruz-Benito, J., Martín-González, M., Vázquez-Ingelmo, A., Sánchez-Prieto, J.C., Therón, R., 2018. Proposing a machine learning approach to analyze and predict employment and its factors. International Journal of Interactive Multimedia and Artificial Intelligence 5, 39–45.
Guo et al. [2019] Guo, Y., Wang, J., Chen, H., Li, G., Huang, R., Yuan, Y., Ahmad, T., Sun, S., 2019. An expert rule-based fault diagnosis strategy for variable refrigerant flow air conditioning systems. Applied Thermal Engineering 149, 1223–1235.
Guyon et al. [2002] Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Machine learning 46, 389–422.
Gönen and Heller [2005] Gönen, M., Heller, G., 2005. Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92, 965–970.
Herbrich et al. [2000] Herbrich, R., Graepel, T., Obermayer, K., 2000. Large margin rank boundaries for ordinal regression. advances in large margin classifiers .
Hsia et al. [2018] Hsia, C., Chiang, W., Lin, C., 2018. Preconditioned conjugate gradient methods in truncated newton frameworks for large-scale linear classification, in: Proceedings of The 10th Asian Conference on Machine Learning, ACML 2018, Beijing, China, November 14-16, 2018, pp. 312–326. URL: http://proceedings.mlr.press/v95/hsia18a.html.
Huang et al. [2019] Huang, J., Dong, M., Lu, S., Yu, Y., Liu, C., Yoo, J.H., Lu, J., 2019. A hybrid model combining wavelet transform and recursive feature elimination for running state evaluation of heat-resistant steel using laser-induced breakdown spectroscopy. Analyst 144, 3736–3745.
Joachims [2002] Joachims, T., 2002. Optimizing search engines using clickthrough data, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 133–142.
Karasu and Altan [2019] Karasu, S., Altan, A., 2019. Recognition model for solar radiation time series based on random forest with feature selection approach, in: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), pp. 8–11.
Karasu et al. [2018] Karasu, S., Altan, A., Saraç, Z., Hacioglu, R., 2018. Prediction of solar radiation based on machine learning methods, in: Conference: International Engineering Research Symposium (UMAS’17).
Kubat et al. [1998] Kubat, M., Holte, R.C., Matwin, S., 1998. Machine learning for the detection of oil spills in satellite radar images. Machine learning 30, 195–215.
Leontiadis et al. [2017] Leontiadis, I., Serra, J., Finamore, A., Dimopoulos, G., Papagiannaki, K., 2017. The good, the bad, and the kpis: how to combine performance metrics to better capture underperforming sectors in mobile networks, in: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), IEEE. pp. 297–308.
Lin et al. [2007] Lin, C., Weng, R.C., Keerthi, S.S., 2007. Trust region newton methods for large-scale logistic regression, in: Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pp. 561–568.
Lu et al. [2019] Lu, F., Wu, J., Huang, J., Qiu, X., 2019. Aircraft engine degradation prognostics based on logistic regression and novel os-elm algorithm. Aerospace Science and Technology 84, 661–671.
Nuortio et al. [2006] Nuortio, T., Kytöjoki, J., Niska, H., Bräysy, O., 2006. Improved route planning and scheduling of waste collection and transport. Expert systems with applications 30, 223–232.
Platt [2000] Platt, J., 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in: Advances in Large Margin Classifiers.
Quevedo and Montañés [2009] Quevedo, J., Montañés, E., 2009. Obtaining rubric weights for assessments by more than one lecturer using a pairwise learning model. International Working Group on Educational Data Mining .
Radović and Stević [2018] Radović, D., Stević, Ž., 2018. Evaluation and selection of kpi in transport using swara method. Transport & Logistics: The International Journal 8, 60–68.
Richhariya et al. [2020] Richhariya, B., Tanveer, M., Rashid, A., Initiative, A.D.N., et al., 2020. Diagnosis of alzheimer’s disease using universum support vector machine based recursive feature elimination (usvm-rfe). Biomedical Signal Processing and Control 59, 101903.
Rtayli and Enneya [2020] Rtayli, N., Enneya, N., 2020. Enhanced credit card fraud detection based on svm-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications 55, 102596.
Wu et al. [2004] Wu, T.f., Lin, C.J., Weng, R., 2004. Probability estimates for multi-class classification by pairwise coupling 5.
Xie et al. [2019] Xie, Y., Liu, J., Huang, T., Li, J., Niu, J., Mak, C.M., Lee, T.c., 2019. Outdoor thermal sensation and logistic regression analysis of comfort range of meteorological parameters in hong kong. Building and Environment 155, 175–186.
Zhong et al. [2018] Zhong, C., Yen, J., Liu, P., Erbacher, R.F., 2018. Learning from experts’ experience: toward automated cyber security data triage. IEEE Systems Journal 13, 603–614.
Zolfani et al. [2018] Zolfani, S.H., Yazdani, M., Zavadskas, E.K., 2018. An extended stepwise weight assessment ratio analysis (swara) method for improving criteria prioritization process. Soft Computing 22, 7399–7405.