Study the Alarms

Understand and Sort the Alarms

The list of all the existing alarms is given in Value Analysis Alarms.

Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.

False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.

The list of the generated assertions can easily be extracted from the properties.csv file (see Information about the Properties). Then, for instance, these assertions can be counted in order to track the evolution of their total number during the working process. (Note, however, that this particular measure is not necessarily very pertinent, because the relation between problems and emitted alarms is not really one-to-one. Losing precision at one point of the analysis can lead to several alarms which have the same origin. Moreover, solving one problem may cause many unrelated new alarms, as several problems might have been hiding behind the solved one.)

The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.

About the Value Analysis

Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.

Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.

See Tune the Precision for explanations concerning tuning the precision level with the -slevel option. The precision level is related with the number of states that can be stored for each program point. The smaller this number is, the more coarse the approximation is, as more computed states are merged together.

Example:

  //@ assert 0 < x < 10;
  if (x < 5)
     y = 5;
  else
     y = 10;
L:

Computing a unique state at label L only gives that x [1..9] and y [5..10]. But if the slevel is larger, then two states can be stored at L giving exactly that either y == 5 when x [1..4] or y == 10 when x [5..9].

Notice that the assertion assert 0 < x < 10 above reduces the possible values for x. It is important to remember that this works in the same way for the assertions automatically generated from alarms. For instance, if a statement a = b / c; is reached with c [0..100], an alarm is emitted in form of an appropriate assertion:

/*@ assert Value: division_by_zero: c ≢ 0; */

Then, the analysis continues with context where c [1..100].

Propagate values separately

Beside conditions in the code that automatically split the states as above (when there is enough slevel), some builtins are available to generate more than one state.

The builtin tis_variable_split (a, sz, n); splits the state on the data which address is a and the size sz if it holds less than n values. For instance, if f returns x [1..5] in the following code, the call to tis_variable_split generates five states, one for each value of x:

int x = f();
tis_variable_split (&x, sizeof(x), 10);

Moreover, the builtin tis_interval_split(l, u) does the same thing that tis_interval(l, u) does, but it automatically causes the individual values between l and u inclusive to be propagated separately. The slevel option must then be set high enough to keep the precision of the analysis.

In the following example, since all values of n are propagated separately, the analyzer is able to guarantee that the x_pos assertion holds.

#include <tis_builtin.h>

int G;

void foo(int n)
{
    int x = G;
    /*@ assert x_pos: 1 <= x <= 10; */
}

int main(void)
{
    int n = tis_interval_split(-10, 10);
    G = n;
    if (n > 0)
        foo(n + 5);
    return 0;
}

Remove Alarms by Tuning the Analysis

The analysis runs with a precision setting called the slevel limit which indicates the maximum number of individual states the analyzer is allowed to keep separated at each analyzed statement. When this limit is reached at one particular statement, the analyzer merges states together. Capping precision in this way and merging states prevents a combinatorial explosion, while the merge itself is designed not to miss any undefined behaviors that follow.

However, in particular cases, the loss of precision caused by merging states may result in the appearance of false alarms. In such cases, it is instrumental to tune the analyzer to improve precision at critical points of the program. The basic technique for doing so is to allow the analyzer to keep states separate by increasing the slevel limit that applies to those statements. The slevel limit can be increased for the entire program, but for the purposes of tuning, it is most beneficial to tune the limit locally, typically with a per-function granularity.

Apart from manipulating the slevel limit, there are advanced techniques that provide control over how the analyzer handles states. Doing so can limit the number of produced separate states by injecting strategically-placed merges or ensuring that specific interesting states are kept separate, to ensure an improvement in precision elsewhere. Some of these techniques are also described further in this section as well.

Crucially, maintaining precision of every variable at every statement should not be a goal in itself. Analyzing the behavior of the target code in one pass for the millions of variable values that can be grouped together is how the analyzer manages to provide guarantees “for all possible input vectors” while using reasonable time and space for the analysis. Therefore, using the tuning techniques described in this section should only be applied when imprecision leads to false alarms.

Detecting precision loss

The analyzer GUI refers to the number of states generated by the analysis at a given statement as the slevel counter, which is compared against the slevel limit that applies to that statement. Note that different statements may be set to have different slevel limits (see Tuning the slevel limit).

The slevel limit for a given statement is displayed in the current function information widget. In the example below, the slevel limit of the currently selected statement is set to 200.

Current function widget

Additionally, the slevel counters and limits can be listed (and sorted) for all statements in the currently viewed function by opening the in the statements tab (see Statements).

However, more conveniently, in the GUI, statements whose slevel counter exceeded their slevel limit are indicated by a red bar in the margin of the interactive code view (see Interactive Code Panel). Similarly, statements whose slevel limit has not been exceeded, but whose slevel counter reached above 50% of their allotted limit are marked with a yellow margin bar.

The example below shows a snippet of normalized C code with margins marked in red and yellow. The yellow margin bar shows that the analyzer propagated enough states through statement j = ...; generated to reach at least 50% of its slevel limit without exceeding it. The red margin bar shows that analyzer exceeded the allowed slevel limit at the statement k = .... The statement i = ... is not marked in the margin, so its slevel counter was than 50% of its slevel.

Normalized code view with six lines, 3 declarations of variables i, j, k and 3 assignments from a call to tis_interval_split(0, 2) to i, j, k. There is a yellow bar next to the assignment to j, and a red bar next to the assignment to k.

Hovering over the margin shows the number of states at this statement—its slevel counter—and the statement’s slevel limit. Here, the statement j = ... used 3 out of its slevel limit of 5.

Normalized code view with a mouse pointer over a yellow bar in the margin showing the hover hint "Slevel counter 3/5"

Other reasons for imprecision

The analysis can also get imprecise for other reasons than reaching the defined precision level. This is especially the case when the log trace includes messages about garbled mix values. It is very likely that if such messages appear, the analysis will not produce any interesting results.

Tip

The analysis can be stopped automatically from the command line option -val-stop-at-nth-garbled.

This option can also be set from the GUI.

The analysis can also be stopped when it reaches the maximum memory consumption set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM. If TIS_KERNEL_MAX_MEM is set, the analyzer becomes more conservative in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2 GiB of memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value for this variable is 62.

Tuning the slevel limit

There are two basic controls for tuning the slevel limits of the analyzer, both of which can be set via commandline options:

  • -slevel n: use n as the default global slevel limit;
  • -slevel-function f:n: use n as the slevel limit for function f (this overrides the global limit);

In the following example, the analyzer will execute with a global slevel limit of 100, which will apply to all statements in all functions except main, whose slevel limit will be set to 10.

$ tis-analyzer -val -slevel 100 -slevel-function main:10 example.c

Equivalent settings are also configurable via JSON configuration files (see Loading an analysis configuration).

Function-specific slevel values can be set via the GUI in the current function information widget:

Current function widget with a mouse pointer over a button showing the hover hint "Change slevel for this function"

Note

Since the default global slevel limit is 0, some tuning will be necessary for almost all analyzed code bases.

Tuning splitting, merging and widening behavior

There are several other options that can be used to fine tune the value analysis.

Some of them control whether states can be kept separated at function returns or loops:

  • -val-split-return-function f:n: split the return states of function f according to \result == n and \result != n;

  • -val-split-return auto: automatically split the states at the end of each function according to the function return code;

  • -val-split-return-function f:full: keeps all the computed states at the end of function f in the callers;

  • -val-slevel-merge-after-loop <f | @all> when set, the different execution paths that originate from the body of a loop are merged before entering the next execution.

    The default behavior is set to all functions (-val-slevel-merge-after-loop=@all). It can be removed for some functions (-val-slevel-merge-after-loop=@all,-f), and deactivated for all functions (-val-slevel-merge-after-loop=-@all), and activated only for some (-val-slevel-merge-after-loop=-@all,+f).

  • -wlevel n: do n loop iterations before widening.

Merging states by variable value

The analyzer can merge states on demand before analyzing specific statements based on the values of specific variables.

To induce the analyzer to merge states at some expression, add a comment to the source code that specifies how the merge should be performed:

  • //@ slevel merge merges all the states before analyzing the first statement following the comment (note there is no underscore between slevel and merge);
  • //@ slevel_merge x selectively merges all such states where variable x has the same value before analyzing the first statement following the comment (note there is an underscore between slevel and merge);
  • //@ slevel_merge x, y, selectively merges all the states that have the same value for x, and that have the same value for y, and so on.

For example, consider the following program:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    tis_show_each("i j k", i, j, k);
}

When the program is analyzed (with tis-analyzer -val -slevel 100) each of the assignments to variables i, j, and k creates two separate states. In effect the analyzer constructs eight separate states at the statement tis_show_each:

[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {1})

On the other hand, it is possible to limit the number of states by merging them according to the value of variable i by adding a comment to the tis_show_each statement:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel_merge i;
    tis_show_each("i j k", i, j, k);
}

In this case, the analyzer only produces two states at that statement, one where the value of i is 0 and another where it is 1. The values of each of the two remaining variables are merged into sets containing both the values of 0 and 1.

[value] Called tis_show_each({{ "i j k" }}, {0}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0; 1}, {0; 1})

It is also possible to limit the number of states by merging them according to the values of multiple variables. To do this, add a comment to the tis_show_each statement that merges the states based on the values of i and j taken together:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel_merge i, j;
    tis_show_each("i j k", i, j, k);
}

This then produces four states, one for each of the four permutations of i and j, with the values of k being merged for each of those states:

[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0; 1})

Finally, all the states can be merged into one as follows (note that there is no underscore between slevel and merge):

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel merge;
    tis_show_each("i j k", i, j, k);
}

This reduces all the states at tis_show_each into a single state:

[value] Called tis_show_each({{ "i j k" }}, {0; 1}, {0; 1}, {0; 1})

Merging states in loops

This technique can be particularly useful when dealing with complex loops. Consider the following example:

#include <tis_builtin.h>

int main() {
    int num = 0;
    int denom = tis_interval(0, 9);
    int step = tis_nondet(-1 , 1);
    
    int i = 0;
    while (i < 10) {
        int direction = tis_interval(0, 1);
        denom += step;
        num = direction ? num + i : num - i;
        i++;
    }

    tis_show_each("denom step", denom, step);
    int result = num / denom;
    return 0;
}

This code snippet calculates the value of integer division between the variables num and denom, which are calculated over 10 iterations of a loop. When denom enters the loop, it has some integer value between 0 and 9 and is increased in each iteration by the value of the variable step. The value of step is indeterminate to the analyzer, but is either -1 or 1 and is constant throughout the execution of the entire loop (as if it were a parameter passed in from outside the function). In the case of num, it starts as 0 and is either increased or decreased by the value of the iterator i, depending on direction. This direction is represented by a tis_interval from 0 to 1, signifying that the direction is determined in such a way that the analyzer cannot definitely predict it. So, in each step the range of values that num can take grows in both directions. The direction is decided separately for each iteration, as if it were a result of a function call executed over and over inside the loop.

The user should be able to quickly determine that denom can never be 0, and so, computing result should not trigger a division by zero. Specifically, denom is expected to be either a value between -10 and -1 if step is negative or a value between 10 and 19 if step is positive.

However, the analyzer will nevertheless report potential undefined behavior when run. (The command includes -val-slevel-merge-after-loop="-main" to prevent it from merging all the states at each iteration.)

$ tis-analyzer -val -slevel 200 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..19], {-1; 1})
tests/tis-user-guide/slevel_merge_loop_1.c:25:[kernel] warning: division by zero: assert denom ≢ 0;

Analyzing the output of the analyzer reveals that the reason behind the detected undefined behavior is that the abstracted value of denom spans from -10 to 19. This is the case because the indeterminacy inside the loop causes the analyzer to maintain a lot of distinct states, leading it to run out its slevel limit and to start merging states. The analyzer specifically merges states where step is both -1 and 1. Since denom can take more values in the merged state than the analyzer can represent by enumeration, it approximated the value of denom as the span [-10..19].

This false alarm can be removed by increasing the precision of the analyzer at that point. One way to do that is to increase the slevel limit:

$ tis-analyzer -val -slevel 520 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"

This works, but since the number of propagated states is very large, the slevel limit must be set to at least the very large number of 520. Using slevel_merge can help keep the slevel limit significantly lower. The following modified snippet inserts an slevel_merge statement just before the loop, directing the analyzer to merge states at the beginning of each loop iteration (because it is inserted before the condition) so that step is kept a singular value in each resulting state.

#include <tis_builtin.h>

int main() {
    int num = 0;
    int denom = tis_interval(0, 9);
    int step = tis_nondet(-1 , 1);
    
    int i = 0;
    //@ slevel_merge step;
    while (i < 10) {
        int direction = tis_interval(0, 1);
        denom += step;
        num = direction ? num + i : num - i;
        i++;
    }
    
    tis_show_each("denom step", denom, step);
    int result = num / denom;
    return 0;
}

This additional guidance prevents the analyzer from merging the negative and positive possible value sets for denom while allowing it to merge states to account for the indeterminacy inside the loop. So, the analysis van be performed with a much lower slevel limit:

$ tis-analyzer -val -slevel 40 slevel_merge_loop_2.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..-1], {-1})
[value] Called tis_show_each({{ "denom step" }}, [10..19], {1})

Tuning value representation

Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:

  • -val-ilevel n: Sets the precision level for integer representation to n: each integer value is represented by a set of enumerated values up to n elements and above this number intervals (with congruence information) are used.
  • -plevel n: Sets the precision level for array accesses to n: array accesses are precise as long as the interval for the index contains less than n values. See Tuning the precision for array accesses for more information about this option.

Disabling alarms

There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).

Of course not all existing options have been enumerated here. The full list of the available options is given by -value-help.

Tuning the precision for array accesses

As explained in Tune the Precision, temporarily reducing the size of the arrays may be a first step during the interactive phase to make the analysis time shorter. But when analyzing large arrays, the -plevel option can be used to increase the precision level for array accesses. This option sets how hard the analyzer tries to be precise for memory accesses that, considering the imprecision on the indexes involved, can be at any of many offsets of a memory block. The default is 200 and may not be sufficient for an access at unknown indices inside a large or nested array to produce a precise result. This is illustrated by the example below:

#include <tis_builtin.h>

char c[20];
struct s { int m[50]; void *p; };
struct s t[60];

void init(void) {
  for (int i = 0; i < 60; i++)
    {
      for (int j = 0; j < 50; j++)
        t[i].m[j] = (i + j) % 10;
      t[i].p = c + i % 20;
    }
}

int main(void) {
  init();
  int x = tis_interval(5, 45);
  int y = tis_interval(2, 56);
  t[y].m[x] = -1;

  x = tis_interval(5, 45);
  y = tis_interval(2, 56);
  int result = t[y].m[x];

  int *p = &t[y].m[x];
  int result2 = *p;
}

With the default value of -plevel (200):

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

With higher plevel:

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ [-1..9]
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

Note that result2 is not precise even with the higher plevel. Handling the lvalue t[y].m[x] directly allows the analyzer to be optimal as long as the value of the plevel option allows it to, but forcing the analyzer to represent the address as the value of the variable p produces this set of offsets:

{{ &t + [428..11604],0%4 }}

This set of offsets contains pretty much the addresses of everything in t, including the p pointer members, so it appears when dereferencing p that the result can be an address.

Trading memory for analysis speed

As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.

The environment variable TIS_KERNEL_MEMORY_FOOTPRINT can be used to set the size of caches used during the value analysis, speeding up some slow analyses in which the caches were getting thrashed. The default is 2 and each incrementation doubles the size of caches. Only use this variable if you are not already low on memory.

Another useful option which helps in reducing the computation time is -memexec-all. If this option is set, when analyzing a function, the tool tries to reuse results from the analysis of previous calls when possible.

Control the Analyzer

Enhance the Trace

It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.

First of all, the standard printf function can be used to output constant messages or messages involving only precise values. However, printing the computed values when they are imprecise is not possible using printf, and instead the tis_show_each function should be used. The name of this function can be extended with any string, so that is is easier to make the difference between different calls (as the full function name is printed each time it is called). For instance:

tis_show_each_iteration (i, t[i]);

The statement above will output messages like this (one for each analyzed call):

[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])

Another useful function, which properly handles imprecise values, is tis_print_subexps. When given any number of expressions as arguments, it will print the values of all the sub-expressions of each provided expression. The first argument of this function must be always a string literal, which will be printed out in order to help distinguish between different calls. For instance:

tis_print_subexps ("simple sums", x + y, y + z, z + x);

Such a statement will output messages like this:

[value] Values of all the sub-expressions of simple sums (expr 1):
    int x + y ∈ {3}
    int x ∈ {1}
    int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
    int y + z ∈ {5}
    int y ∈ {2}
    int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
    int z + x ∈ {4}
    int x ∈ {1}
    int z ∈ {3}

Moreover, tis_print_abstract_each allows the contents of structured variables to be observed. For instance:

tis_print_abstract_each(&i, &t);

This statement will output messages like this (one for each analyzed call):

[value] Called tis_print_abstract_each:
     i ∈ {0; 1; 2; 3; 4}
     t[0..4] ∈ [0..10]
      [5..99] ∈ UNINITIALIZED

Note that, tis_print_abstract_each in general takes addresses of variables as parameters. This applies as well for an array, such as t in the example above. Contrary to popular belief, when t is an array, &t is not the same as t, and the user should call tis_print_abstract_each(&t); to see the whole array (pointer decay only shows the first element).

To get even more information, the tis_dump_each function can be used to print the whole state at the program point where it is called. But it may be easier to call the tis_dump_each_file function to print the state in a file. The name of the file is computed from the first argument of the call (which must be a string literal), an incremented number, and an optional directory given by the -val-dump-directory option. The -val-dump-destination option allows to choose which kind of output is expected among txt or json (all for both, none for no output). For instance, when calling tis_dump_each_file ("state", *(p+i)) in a test.c file, and analyzing it with the command:

$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp

these messages are shown in the trace:

test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'

The two generated files hold both the whole state computed the first time the program point is reached and the possible values for *(p+i).

For instance, the JSON file may look like:

{
  "file": "test.c",
  "line": 11,
  "args": "([0..10])",
  "state": [
    {
      "base": "t",
      "values": [
        { "offset": "[0..4]", "value": "[0..10]" },
        { "offset": "[5..9]", "value": "UNINITIALIZED" }
      ]
    },
    { "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
  ]
}

To better understand the results, see the Value Analysis Data Representation section.

Stop the Analyzer when Alarms Occur

In order to avoid wasting time analyzing the application in a wrong context, the analysis can be stopped as soon as some alarms are generated thanks to the -val-stop-at-nth-alarm option. With the argument equal to 1, it aborts the analysis at the first alarm. To ignore a certain number of alarms, the argument can be increased. Although there is no strict relation between the given argument and the number of alarms generated before the analysis stops (i.e. these two values are not necessarily equal), one thing is guaranteed: providing a larger number will lead to skipping more alarms.

Stop the Analyzer When Taking Too Long

The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:

$ kill -USR1 12345

The analyzer can also be stopped through the GUI (see Disconnect/Kill server).

When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.

The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:

$ tis-analyzer --timeout 5m -val ... -save project.state

Watchpoints

Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:

  • tis_watch_cardinal: stop the analysis when the number of different values that a given memory location may possibly contain (because of imprecision) is greater than a certain maximal amount.
  • tis_watch_value: stop the analysis when a given memory location may possibly contain a value of the provided set of forbidden values.
  • tis_watch_address: stop the analysis when a given memory location may possibly contain an address.
  • tis_watch_garbled: stop the analysis when a given memory location may possibly contain a garbled mix value.
  • tis_detect_imprecise_pointer: stop the analysis when any expression is evaluated to an imprecise pointer which contains a given base address.

These functions are available using #include <tis_builtin.h>. The arguments of the four tis_watch_* functions follow the same logic:

  • The two first arguments define the memory location to watch: its address p and its size s.
  • The last argument n is the number of statements during which the condition may remain true before the analysis is stopped:
    • if n == -1, the analysis never stops, but messages are printed each time the condition is reached;
    • if n == 0, the analysis stops as soon as the condition is reached;
    • if n > 0, the analysis continues for the n-th first occurrences where the condition is reached (and prints messages for each of them) and stops at the next occurrence.
  • The meaning of the (potential) arguments in between varies depending on the particular function.

The function tis_detect_imprecise_pointer only takes a pointer as argument.

Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:

 int x = 0; /* the memory location to watch */
 void *p = (void *)&x; /* its address */
 size_t s = sizeof(x); /* its size */
 int y[10];

 /* The analysis stops when x is not exact (i.e. not a singleton value). */
 int maximal_cardinal_allowed = 1;
 tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);

 /* The analysis stops the fourth time when x may be negative. */
 int forbidden_values = tis_interval(INT_MIN, -1);
 int exceptions = 3;
 tis_watch_value(p, s, forbidden_values, exceptions);

 /* The analysis stops when x may be an address. */
 tis_watch_address(p, s, 0);

/* The analysis stops when x may be a garbled mix value. */
 tis_watch_garbled(p, s, 0);

 p = y;

 /* The analysis starts to detect if an expression is evaluated to an
 imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);

/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;

Remove Alarms by Adding Annotations

Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.

If you do not know how to add ACSL properties to your project, first read ACSL Properties.

Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.

Warning

Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.

Some examples are given below.

If source code modifications are needed, and the source code is in a git repository, the tis-modifications tool may be helpful to track them in order to check that they are correctly guarded. See tis-modifications Manual.

Assertion to Provide Missing Information

As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.

Example:

     int T[10];
     ...
L1:  if (0 <= x && x < y) {
       ...
L2:    if (y == 10) {
L3:      ... T[x] ...
         }
       ...
       }

When the T[x] expression is encountered at the label L3, the analyzer tries to check if the T array is accessed correctly (i.e. inside the array bounds), namely, is the (0 <= x < 10) condition true. It already knows that the (0 <= x) part holds, due to the conditional statement at label L1 (assuming that x has not been modified since then). Whatever values x might have had before, at L1 they have been restrained to only non-negative ones and this fact has been stored in the internal state (the interval of values for x was modified) and thus is visible at L3. For example, if before L1 the value of x was [--..--] (i.e. nothing is known about x except that it is initialized), then after L1 it would be [0..--] (i.e. the interval spanning from zero to positive infinity).

Now, the analyzer still needs to verify if (x < 10) also holds. This is obvious for anybody who reads the code: the condition at L1 assures that (x < y) and the condition at L2 assures that (y == 10), therefore (x < 10) must be true. Unfortunately, because of the limitations of the adopted value representation method, the analyzer cannot deduce this by itself. The fact that (x < y) holds just cannot be expressed in the internal state (nor actually any abstract relation between variables). And, supposing that the value of y at L1 is [--..--], the (x < y) condition does not help to restrain the values of x, its upper bound remains thus as before. Hence, this important piece of information is lost and the analyzer simply cannot connect it with (y == 10) at L2 in order to correctly restrain the value of x to [0..9]. So at L3 it will consider that the value of x is [0..--] and it will emit an alarm about a potential out of bound access to the array T.

To help the analyzer, the appropriate assertion can be added explicitly:

at L3: assert ax: x < 10;

Then the alarm will disappear. Of course the ax assertion still needs to be verified by other means. For example, this particular assertion can be easily proven using WP (see Prove Annotations with WP).

State Splitting

State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:

at L1: assert ay: y <= 10 || 10 < y;

As explained before, the analyzer can store multiple memory states at each program point (as explained in About the Value Analysis) and the maximal number of states that can be stored per program point in the internal representation is related to the precision level (i.e. slevel). So, provided that the -slevel option has set the precision level high enough in order to permit splitting the state here, the assertion above will lead to a case analysis:

  • The y <= 10 case: As y <= 10 is assumed, together with the x < y condition at L1, it leads to deducing x < 10, and this time this can be represented in the internal state.
  • The 10 < y case: If 10 < y is assumed, then the condition at L2 is false, and therefore the execution branch where the array T is accessed at L3 is not reached at all.

Thanks to introducing this assertion, the alarm will thus disappear. Moreover, the analyzer is able to check on its own that ay is always true, so there is nothing more to verify.

It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.

Tip

Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).

Store Relational Information

In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.

For example, if the program contains a test (0 <= i+j < 10) and then several uses of T[i+j] follow, it may be convenient to add a temporary variable representing the i+j sum:

int tmp = i + j;
if (tmp < 10) {
  ..
  //@ assert a_tmp: tmp == i + j;
  ... T[tmp] ...
  }

If neither i nor j are modified in the meantime, the assertion that validates the code substitution should be trivial to verify, and the value analysis is now able to know that (tmp < 10).

Loop Invariant

Beside assert, requires and ensures, the loop invariant properties are also useful to enhance the analysis precision.

For instance, this function generates an alarm:

int T[100];

//@ requires 0 <= n < 50;
void main(int n)
{
    int i = 0;
    while (i <= n) {
        T[i] = 3;
        i++;
    }
    T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;

This is because the value of i is too imprecise when leaving the loop, so the analysis doesn’t know if the access to T[i] in the last assignment is valid or not.

Adding this loop invariant remove the alarm:

/*@ loop invariant i <= 50; */

Moreover, the value analysis is able to check that this property is always valid.

Bounded Buffers

A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.

buffer problem

Buffer problem: there is a relation between cur and len, but it cannot be represented.

A typical function to handle this buffer is:

void process (char * cur, size_t len) {
  char * p = cur;
  for (size_t i = 0 ; i < len ; i++, p++) {
    *p = ...
  }
}

The validity of the pointer p has to be checked to avoid an alarm on p access, but also to get precise results later on. It is especially important when the pointer points to an array that is part of a larger structure. For instance:

struct data {
  char buffer[BUFFER_LEN];
  unsigned current_index;
  unsigned state;
};

//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
  char * current = data->buffer + data->current_index;
  size_t length = BUFFER_LEN - data->current_index;
  if (n > length) n = length;
  process (current, n);
  ...
  }

If the analysis is not able to know that p does not go beyond the end of the buffer field, the value of the other fields current_index and state might be modified as well and might be too imprecise for the analysis to give interesting results later on.

So the process function needs a precondition to give the constraint on cur and len to ensure the validity of the pointer. This precondition could simply be:

//@ requires \valid (cur + 0 .. len-1);

Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:

  • if the data to reduce is the pointer:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
  • if the data to reduce is the length:
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);

Notice that the ACSL functions \base_addr, \block_length and \offset only provide the expected information when cur is a pointer to an array allocated on its own. If the array is a field in a structure, \base_addr(cur) returns the base address of the structure.

base of a pointer in a struture

Structure with a .buf field: ACSL functions are related to the allocated block, not the internal array.

Anyway in some cases, even if the analyzer computes the optimal information, cur and len both have an unknown value from intervals and the relation between the two variables has been lost. So the memory access to (*p) raises an alarm when we cannot check that adding the upper bound of both intervals is smaller than (buf + BUFFER_LEN). Moreover, if buf is in a structure as explained above, buf and BUFFER_LEN may be unknown in the function.

A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:

/*@ requires f_r_buf: val: cur <= bound;
    requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
  char * p = cur;
  //@ loop invariant f_l_p: val: p <= bound;
  for (size_t i = 0 ; i < len ; i++, p++) {
    if (p >= bound) break;
    *p = ...
  }
}

In the previous example, the call to process would have to be changed to:

process_bounded (current, n, data->buffer + BUFFER_LEN);

As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.

Two annotated functions with such bounds, tis_memset_bounded and tis_memcpy_bounded, are provided to be used instead of memset and memcpy when this problem occurs with these libc functions.