The current version of the scoring function compute_scores seems to weigh non-rare states higher than rare ones. Instead of selecting states that explored rare value combinations, it prefers the ones where the value/address count is higher. Let us consider the following example input.
uint64_t main() {
uint64_t i;
uint64_t j;
uint64_t* x;
i = 0;
j = 0;
x = malloc(8);
*x = 0;
read(0, x, 1);
if (*x < 25) {
*x = 0;
while (i < 150) {
i = i + 1;
}
return 1;
} else {
*x = 0;
while (j < 150) {
j = j + 1;
}
return 0;
}
}
In this example I expect the simulation to fall into the true-branch in about 10% of all runs. Hence incrementing the memory location of i will be rarer than incrementing j with a high probability. I'd expect rarity simulation to select states that are looping in the true-branch. Note that the looping value of 150 was chosen so that a state needs to survive and be selected for exactly one iteration to reach an exit path (assuming the default step-size of 1000).
This however does not happen (when using the default harmonic mean), see the following run. I also dump the scores vector for debugging purposes and to prove that at least one state did fall into true-branch (two in this case, the ones with the lower score):
$ time ./target/debug/monster rarity --states 10 --selection 3 --iterations 2 ./examples/select-rare.m
Running rarity simulation round 1...
Creating 10 new states (0 copied, 10 new)
Running engines (took 23.706977ms)
Remove 0 successfully exited states from selection
Scoring states (took 402.021139ms)
scores: [0.14409221902017258, 0.14727540500736333, 0.14727540500736333, 0.14409221902017258, 0.14727540500736333, 0.14727540500736333, 0.14727540500736333, 0.14727540500736333, 0.14727540500736333, 0.14727540500736333]
selecting rarest 3 states
Running rarity simulation round 2...
Creating 7 new states (4 copied, 3 new)
Running engines (took 14.214393ms)
Remove 7 successfully exited states from selection
Scoring states (took 126.964981ms)
scores: [0.013235294117646955, 0.013235294117646955, 0.01311953352769669]
selecting rarest 3 states
no bug found in binary
real 0m1,181s
user 0m1,091s
sys 0m0,084s
When running the same example with the non-default arithmetic mean, the rarer state is preferred, because score_with_mean returns Ordering::Less in this case. See the following run as an example. Again notice the single state with a lower score (which incidentally does get selected this time):
$ time ./target/debug/monster rarity --states 10 --selection 3 --iterations 2 --mean arithmetic ./examples/select-rare.m
Running rarity simulation round 1...
Creating 10 new states (0 copied, 10 new)
Running engines (took 24.153033ms)
Remove 0 successfully exited states from selection
Scoring states (took 403.984164ms)
scores: [1065.6, 1065.6, 1065.6, 1062.4, 1065.6, 1065.6, 1065.6, 1065.6, 1065.6, 1065.6]
selecting rarest 3 states
Running rarity simulation round 2...
Creating 7 new states (4 copied, 3 new)
bug found:
exit code greater than zero
concrete inputs read: [21]
pc: 0x1005c
real 0m1,050s
user 0m0,971s
sys 0m0,076s
Even though the arithmetic mean works out in this example, I still do believe both variants to be somewhat faulty. Here is my understanding of which scoring functions are currently implemented:
harmonic: The harmonic mean over all (non-inverse) counters for a state, but with n being the number of states instead of the number of counters in the state.
arithmetic: The arithmetic mean over all (non-inverse) counters for a state, but again with n being the number of states instead of the number of counters in the state. This time using Ordering::Less which somewhat hides the issue.
Here is what I think would be a better alternative for what we could try:
sum: The sum of inverse(!) counter values, no mean calculation. I do believe this is the scoring function described in the paper originally, we should at least add it to our portfolio I think, even if we decide not to make if the default.
arithmetic*: The arithmetic mean of the inverse(!) counter values. But importantly with n being the number of counters in the given state, otherwise we are just dividing by a constant and not normalizing with respect to the number of counters (i.e. number of memory locations with a concrete value) in each individual state.
harmonic*: The harmonic mean of the inverse(!) counter values. Again with n set like above for the same reason.
Note that all three proposed scores use the intuitive Ordering::Greater because higher scores represent rarer states in all three cases. The inverse was taken on the individual counter values before mean calculation, not by flipping the interpretation of the mean. I do believe this makes a difference in semantics. And it also makes interpretation of the score uniform, higher is rarer, higher is better.
WDYT about this alternative? Or am I completely off base here? All comments welcome!
The current version of the scoring function
compute_scoresseems to weigh non-rare states higher than rare ones. Instead of selecting states that explored rare value combinations, it prefers the ones where the value/address count is higher. Let us consider the following example input.In this example I expect the simulation to fall into the true-branch in about 10% of all runs. Hence incrementing the memory location of
iwill be rarer than incrementingjwith a high probability. I'd expect rarity simulation to select states that are looping in the true-branch. Note that the looping value of 150 was chosen so that a state needs to survive and be selected for exactly one iteration to reach an exit path (assuming the default step-size of 1000).This however does not happen (when using the default
harmonicmean), see the following run. I also dump the scores vector for debugging purposes and to prove that at least one state did fall into true-branch (two in this case, the ones with the lower score):When running the same example with the non-default
arithmeticmean, the rarer state is preferred, becausescore_with_meanreturnsOrdering::Lessin this case. See the following run as an example. Again notice the single state with a lower score (which incidentally does get selected this time):Even though the
arithmeticmean works out in this example, I still do believe both variants to be somewhat faulty. Here is my understanding of which scoring functions are currently implemented:harmonic: The harmonic mean over all (non-inverse) counters for a state, but withnbeing the number of states instead of the number of counters in the state.arithmetic: The arithmetic mean over all (non-inverse) counters for a state, but again withnbeing the number of states instead of the number of counters in the state. This time usingOrdering::Lesswhich somewhat hides the issue.Here is what I think would be a better alternative for what we could try:
sum: The sum of inverse(!) counter values, no mean calculation. I do believe this is the scoring function described in the paper originally, we should at least add it to our portfolio I think, even if we decide not to make if the default.arithmetic*: The arithmetic mean of the inverse(!) counter values. But importantly withnbeing the number of counters in the given state, otherwise we are just dividing by a constant and not normalizing with respect to the number of counters (i.e. number of memory locations with a concrete value) in each individual state.harmonic*: The harmonic mean of the inverse(!) counter values. Again withnset like above for the same reason.Note that all three proposed scores use the intuitive
Ordering::Greaterbecause higher scores represent rarer states in all three cases. The inverse was taken on the individual counter values before mean calculation, not by flipping the interpretation of the mean. I do believe this makes a difference in semantics. And it also makes interpretation of the score uniform, higher is rarer, higher is better.WDYT about this alternative? Or am I completely off base here? All comments welcome!