AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning

General response

We thank the reviewers for their insightful comments. After reading all the review we have made the following major changes to make the paper easier to understand: 1) we have rephrased the summary of FOLD-R algorithm with brief comments. 2) to better explain how we use the prefix sum during heuristic calculation, we give the comparison assumption examples and an extra brief explanation for the parameters of the information gain function. 3) we have replaced the “Titanic Survival” dataset with the “Adult Census Income” dataset (more complicated) to better illustrate the explainability of the FOLD-R++ algorithm. We also give the output of the RIPPER algorithm to show the different rule format of the two algorithms. The github location of the source code is also included.

Response to Reviewer 1

We rephrase the summary of Algorithm 1 with some brief comments for better readability. The ADD_BEST_LITERAL function calculates information gain for all the possible literals and adds the one with the highest score to the current clause. In the FOLD algorithm, with encoding data preparation, this function only need to deal with categorical values. In FOLD-R algorithm, the function additionally calculates the heuristic for all the possible numerical splits for numerical features. And, this is the FOLD-R for numeric extension. The other functions of FOLD-R are inherited from FOLD.

Response to Reviewer 2

The mentioned grammatical mistakes have been fixed. We have rephrased the summary of the FOLD-R algorithm with brief comment for better readability.

The output of the RIPPER algorithm is shown in the updated version, more repeated literals appear in the formulas. Exception learning procedure only focus on smaller hypothesis space other than the whole hypothesis space, the literals that locates the hypothesis space implied by default is therefore unnecessary. For the “Adult Census Income” example in the paper, the RIPPER algorithm generates 53 rules with 235 literals while the FOLD-R++ algorithm generates 13 rules with 36 literals by explicitly learning exceptions.

To better explain how the prefix sum technique is used in heuristic calculation, news examples (Table 1) are given for comparison assumption in FOLD-R++ and how the information gain function is called.

The average execution time of FOLD-R for the “Adult Census Income” and “Credit Card Approval” is estimated with polynomial regression on several rounds of experiments. The number of examples used for the estimation experiments are 100, 200, 300, 500, 800, 1300, 2100, 3400, 5500. The minimum number of the estimation is reported on the Table 4 for each of them (“Adult Census Income” and “Credit Card Approval” dataset).

Response to Reviewer 3

Among the three major differences we mentioned in the paper, the first two significantly improves the performance:

1) The introduction of prefix sum for heuristic calculation improve the algorithm in time complexity level. The FOLD-R++ would have taken days to finish the training on the Adult Census Income dataset without the prefix sum technique, in contrast, it takes around 10 second to finish the training with this improvement!!

2) The introduction of exception ratio improves the algorithm on higher level. The FOLD and FOLD-R would start to learn exceptions when it failed to find a literal to avoid falsely covering the negative examples. However, avoiding false covering negative examples by adding literals to the default part would reduce the number of positive examples the rule can imply. Explicitly activating the exception learning procedure with a proportional threshold could increase the number of positive examples a rule can cover while reducing the total number of rules generated. As a result, the interpretability is increased due to fewer rules and literals being generated. For the Adult Census Income dataset, without the hyper-parameter exception ratio, the FOLD-R++ (set ratio as 0) would take around 30 minutes to finish the training and generate more than 500 rules.

3) The FOLD and FOLD-R disabled the negated literals in the default theories to make the generated rules look more elegant (only exceptions are negated literals). However, a negated literal sometimes is the optimal literal (with the most useful information gain). FOLD-R++ enables the negated literals in the default part of the generated rules. We cannot make sure that FOLD-R++ generates optimal literal combination, because it is a greedy algorithm. Instead of choosing sub-optimal literal in literal selection iteration, finding the optimal literal should be an improvement.

The ratio set as 0.5 means the exception examples a rule can cover should be less than half of the examples covered by default. We introduce the hyper-parameter $ratio$ as a proportional value in order to make the algorithm scalable. In experiment, we intend to show the experiment with only simple setting. This parameter can impact on the result on accuracy and number of rules generated, we can try multiple values for training in practice.