The list of all the existing alarms is given in Value Analysis Alarms.
Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.
False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.
The list of the generated assertions can easily be extracted from the
properties.csv
file (see Information about the Properties). Then, for instance,
these assertions can be counted in order to track the evolution of their total
number during the working process. (Note, however, that this particular measure
is not necessarily very pertinent, because the relation between problems and
emitted alarms is not really one-to-one. Losing precision at one point of the
analysis can lead to several alarms which have the same origin. Moreover,
solving one problem may cause many unrelated new alarms, as several problems
might have been hiding behind the solved one.)
The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.
Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.
Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.
See Tune the Precision for explanations concerning tuning the precision level
with the -slevel
option. The precision level is related with the number of
states that can be stored for each program point. The smaller this number is,
the more coarse the approximation is, as more computed states are merged
together.
Example:
//@ assert 0 < x < 10;
if (x < 5)
y = 5;
else
y = 10;
L:
Computing a unique state at label L
only gives that x ∈ [1..9]
and y ∈
[5..10]
. But if the slevel
is larger, then two states can be stored at
L
giving exactly that either y == 5
when x ∈ [1..4]
or y == 10
when
x ∈ [5..9]
.
Notice that the assertion assert 0 < x < 10
above reduces the possible
values for x
. It is important to remember that this works in the same way
for the assertions automatically generated from alarms. For instance, if a
statement a = b / c;
is reached with c ∈ [0..100]
, an alarm is emitted
in form of an appropriate assertion:
/*@ assert Value: division_by_zero: c ≢ 0; */
Then, the analysis continues with context where c ∈ [1..100]
.
Beside conditions in the code that automatically split the states as above
(when there is enough slevel
),
some builtins are available to generate more than one state.
The builtin tis_variable_split (a, sz, n);
splits the state
on the data which address is a
and the size sz
if it holds less than n
values.
For instance, if f
returns x ∈ [1..5]
in the following code,
the call to tis_variable_split
generates five states,
one for each value of x
:
int x = f();
tis_variable_split (&x, sizeof(x), 10);
Moreover, the builtin tis_interval_split(l, u)
does the same thing that
tis_interval(l, u)
does, but it automatically causes the
individual values between l
and u
inclusive to be propagated
separately. The slevel
option must then be set high enough to keep
the precision of the analysis.
In the following example, since all values of n
are propagated
separately, the analyzer is able to guarantee that the x_pos
assertion
holds.
#include <tis_builtin.h>
int G;
void foo(int n)
{
int x = G;
/*@ assert x_pos: 1 <= x <= 10; */
}
int main(void)
{
int n = tis_interval_split(-10, 10);
G = n;
if (n > 0)
foo(n + 5);
return 0;
}
The analysis runs with a precision setting called the slevel
limit which
indicates the maximum number of individual states the analyzer is allowed to
keep separated at each analyzed statement. When this limit is reached at one
particular statement, the analyzer merges states together. Capping precision in
this way and merging states prevents a combinatorial explosion, while the merge
itself is designed not to miss any undefined behaviors that follow.
However, in particular cases, the loss of precision caused by merging states may
result in the appearance of false alarms. In such cases, it is instrumental to
tune the analyzer to improve precision at critical points of the program. The
basic technique for doing so is to allow the analyzer to keep states separate by
increasing the slevel
limit that applies to those statements. The slevel
limit can be increased for the entire program, but for the purposes of tuning,
it is most beneficial to tune the limit locally, typically with a per-function
granularity.
Apart from manipulating the slevel
limit, there are advanced techniques that
provide control over how the analyzer handles states. Doing so can limit the
number of produced separate states by injecting strategically-placed merges or
ensuring that specific interesting states are kept separate, to ensure an
improvement in precision elsewhere. Some of these techniques are also described
further in this section as well.
Crucially, maintaining precision of every variable at every statement should not be a goal in itself. Analyzing the behavior of the target code in one pass for the millions of variable values that can be grouped together is how the analyzer manages to provide guarantees “for all possible input vectors” while using reasonable time and space for the analysis. Therefore, using the tuning techniques described in this section should only be applied when imprecision leads to false alarms.
The analyzer GUI refers to the number of states generated by the analysis at a
given statement as the slevel
counter, which is compared against the
slevel
limit that applies to that statement. Note that different statements
may be set to have different slevel
limits (see
Tuning the slevel limit).
The slevel
limit for a given statement is displayed in the current function
information widget. In the example below, the slevel
limit of the currently
selected statement is set to 200.
Additionally, the slevel
counters and limits can be listed (and sorted) for
all statements in the currently viewed function by opening the in the statements
tab (see Statements).
However, more conveniently, in the GUI, statements whose slevel
counter
exceeded their slevel
limit are indicated by a red bar in the margin of the
interactive code view (see Interactive Code Panel). Similarly, statements
whose slevel
limit has not been exceeded, but whose slevel
counter
reached above 50% of their allotted limit are marked with a yellow margin bar.
The example below shows a snippet of normalized C code with margins marked in
red and yellow. The yellow margin bar shows that the analyzer propagated enough
states through statement j = ...;
generated to reach at least 50% of its
slevel
limit without exceeding it. The red margin bar shows that analyzer
exceeded the allowed slevel
limit at the statement k = ...
. The
statement i = ...
is not marked in the margin, so its slevel
counter was
than 50% of its slevel
.
Hovering over the margin shows the number of states at this statement—its
slevel
counter—and the statement’s slevel
limit. Here, the statement
j = ...
used 3 out of its slevel
limit of 5.
The analysis can also get imprecise for other reasons than reaching the defined
precision level. This is especially the case when the log trace includes messages
about garbled mix
values. It is very likely that if such messages appear,
the analysis will not produce any interesting results.
Tip
The analysis can be stopped automatically from the command line
option -val-stop-at-nth-garbled
.
This option can also be set from the GUI.
The analysis can also be stopped when it reaches the maximum memory consumption
set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM
. If
TIS_KERNEL_MAX_MEM
is set, the analyzer becomes more conservative
in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2
GiB of
memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM
GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value
for this variable is 62
.
slevel
limit¶There are two basic controls for tuning the slevel
limits of the analyzer,
both of which can be set via command-line options:
-slevel n
: use n
as the default global slevel
limit;
-slevel-function f:n
: use n
as the slevel
limit for function f
(this overrides the global limit);
In the following example, the analyzer will execute with a global slevel
limit of 100, which will apply to all statements in all functions except main,
whose slevel
limit will be set to 10
.
$ tis-analyzer -val -slevel 100 -slevel-function main:10 example.c
Equivalent settings are also configurable via JSON configuration files (see Loading an analysis configuration).
Function-specific slevel
values can be set via the GUI in the current function
information widget:
Note
Since the default global slevel
limit is 0, some tuning will be necessary
for almost all analyzed code bases.
There are several other options that can be used to fine tune the value analysis.
Some of them control whether states can be kept separated at function returns or loops:
-val-split-return-function f:n
: split the return states of function f
according to \result == n
and \result != n
;
-val-split-return auto
: automatically split the states at the end of each
function according to the function return code;
-val-split-return-function f:full
: keeps all the computed states at the
end of function f
in the callers;
-val-slevel-merge-after-loop <f | @all>
when set, the different
execution paths that originate from the body of a loop are merged
before entering the next execution.
The default behavior is set to all functions
(-val-slevel-merge-after-loop=@all
). It can be removed for some
functions (-val-slevel-merge-after-loop=@all,-f
), and
deactivated for all functions
(-val-slevel-merge-after-loop=-@all
), and activated only for some
(-val-slevel-merge-after-loop=-@all,+f
).
-wlevel n
: do n
loop iterations before widening.
The analyzer can merge states on demand before analyzing specific statements based on the values of specific variables.
To induce the analyzer to merge states at some expression, add a comment to the source code that specifies how the merge should be performed:
//@ slevel merge
merges all the states before analyzing the first
statement following the comment (note there is no underscore between slevel
and
merge
);
//@ slevel_merge x
selectively merges all such states where variable x
has the same value before analyzing the first statement following the comment
(note there is an underscore between slevel
and merge
);
//@ slevel_merge x, y, …
selectively merges all the states that have the
same value for x, and that have the same value for y, and so on.
For example, consider the following program:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
tis_show_each("i j k", i, j, k);
}
When the program is analyzed (with tis-analyzer -val -slevel 100
) each of
the assignments to variables i
, j
, and k
creates two separate states.
In effect the analyzer constructs eight separate states at the statement
tis_show_each:
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {1})
On the other hand, it is possible to limit the number of states by merging
them according to the value of variable i
by adding a comment to the
tis_show_each statement:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel_merge i;
tis_show_each("i j k", i, j, k);
}
In this case, the analyzer only produces two states at that statement, one where the value of i is 0 and another where it is 1. The values of each of the two remaining variables are merged into sets containing both the values of 0 and 1.
[value] Called tis_show_each({{ "i j k" }}, {0}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0; 1}, {0; 1})
It is also possible to limit the number of states by merging them according to
the values of multiple variables. To do this, add a comment to the
tis_show_each statement that merges the states based on the values of i
and j
taken together:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel_merge i, j;
tis_show_each("i j k", i, j, k);
}
This then produces four states, one for each of the four permutations of i
and j
,
with the values of k
being merged for each of those states:
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0; 1})
Finally, all the states can be merged into one as follows (note that there is no
underscore between slevel
and merge
):
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel merge;
tis_show_each("i j k", i, j, k);
}
This reduces all the states at tis_show_each
into a single state:
[value] Called tis_show_each({{ "i j k" }}, {0; 1}, {0; 1}, {0; 1})
This technique can be particularly useful when dealing with complex loops. Consider the following example:
#include <tis_builtin.h>
int main() {
int num = 0;
int denom = tis_interval(0, 9);
int step = tis_nondet(-1 , 1);
int i = 0;
while (i < 10) {
int direction = tis_interval(0, 1);
denom += step;
num = direction ? num + i : num - i;
i++;
}
tis_show_each("denom step", denom, step);
int result = num / denom;
return 0;
}
This code snippet calculates the value of integer division between the variables
num
and denom
, which are calculated over 10 iterations of a loop. When
denom
enters the loop, it has some integer value between 0 and 9 and is
increased in each iteration by the value of the variable step
. The value of
step
is indeterminate to the analyzer, but is either -1 or 1 and is
constant throughout the execution of the entire loop (as if it were a parameter
passed in from outside the function). In the case of num
, it starts as 0
and is either increased or decreased by the value of the iterator i
,
depending on direction
. This direction
is represented by a
tis_interval
from 0
to 1
, signifying that the direction is
determined in such a way that the analyzer cannot definitely predict it. So, in
each step the range of values that num
can take grows in both directions.
The direction is decided separately for each iteration, as if it were a result
of a function call executed over and over inside the loop.
The user should be able to quickly determine that denom
can never be 0
,
and so, computing result
should not trigger a division by zero.
Specifically, denom
is expected to be either a value between -10
and
-1
if step
is negative or a value between 10
and 19
if step
is positive.
However, the analyzer will nevertheless report potential undefined behavior when
run. (The command includes -val-slevel-merge-after-loop="-main"
to prevent it
from merging all the states at each iteration.)
$ tis-analyzer -val -slevel 200 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..19], {-1; 1})
tests/tis-user-guide/slevel_merge_loop_1.c:25:[kernel] warning: division by zero: assert denom ≢ 0;
Analyzing the output of the analyzer reveals that the reason behind the detected
undefined behavior is that the abstracted value of denom
spans from -10
to 19
. This is the case because the indeterminacy inside the loop causes the
analyzer to maintain a lot of distinct states, leading it to run out its
slevel
limit and to start merging states. The analyzer specifically merges
states where step
is both -1
and 1
. Since denom
can take more
values in the merged state than the analyzer can represent by enumeration, it
approximated the value of denom
as the span [-10..19]
.
This false alarm can be removed by increasing the precision of the analyzer at
that point. One way to do that is to increase the slevel
limit:
$ tis-analyzer -val -slevel 520 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
This works, but since the number of propagated states is very large, the
slevel
limit must be set to at least the very large number of 520
. Using
slevel_merge
can help keep the slevel
limit significantly lower. The
following modified snippet inserts an slevel_merge
statement just before the
loop, directing the analyzer to merge states at the beginning of each loop
iteration (because it is inserted before the condition) so that step
is kept
a singular value in each resulting state.
#include <tis_builtin.h>
int main() {
int num = 0;
int denom = tis_interval(0, 9);
int step = tis_nondet(-1 , 1);
int i = 0;
//@ slevel_merge step;
while (i < 10) {
int direction = tis_interval(0, 1);
denom += step;
num = direction ? num + i : num - i;
i++;
}
tis_show_each("denom step", denom, step);
int result = num / denom;
return 0;
}
This additional guidance prevents the analyzer from merging the negative and
positive possible value sets for denom
while allowing it to merge states to
account for the indeterminacy inside the loop. So, the analysis van be performed
with a much lower slevel
limit:
$ tis-analyzer -val -slevel 40 slevel_merge_loop_2.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..-1], {-1})
[value] Called tis_show_each({{ "denom step" }}, [10..19], {1})
Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:
-val-ilevel n
: Sets the precision level for integer representation to
n
: each integer value is represented by a set of enumerated values up to
n
elements and above this number intervals (with congruence information)
are used.
-plevel n
: Sets the precision level for array accesses to n
: array
accesses are precise as long as the interval for the index contains less than
n
values. See Tuning the precision for array accesses for more information about this
option.
There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).
Of course not all existing options have been enumerated here.
The full list of the available options is given by -value-help
.
As explained in Tune the Precision, temporarily reducing the size of the
arrays may be a first step during the interactive phase
to make the analysis time shorter.
But when analyzing large arrays, the -plevel
option can be used
to increase the precision level for array accesses.
This option sets how hard the analyzer tries to be precise for memory accesses
that, considering the imprecision on the indexes involved, can be at any of many
offsets of a memory block. The default is 200 and may not be sufficient for an
access at unknown indices inside a large or nested array to produce a precise
result. This is illustrated by the example below:
#include <tis_builtin.h>
char c[20];
struct s { int m[50]; void *p; };
struct s t[60];
void init(void) {
for (int i = 0; i < 60; i++)
{
for (int j = 0; j < 50; j++)
t[i].m[j] = (i + j) % 10;
t[i].p = c + i % 20;
}
}
int main(void) {
init();
int x = tis_interval(5, 45);
int y = tis_interval(2, 56);
t[y].m[x] = -1;
x = tis_interval(5, 45);
y = tis_interval(2, 56);
int result = t[y].m[x];
int *p = &t[y].m[x];
int result2 = *p;
}
With the default value of -plevel
(200):
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
x ∈ [5..45]
y ∈ [2..56]
result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
With higher plevel:
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
x ∈ [5..45]
y ∈ [2..56]
result ∈ [-1..9]
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
Note that result2
is not precise even with the higher plevel.
Handling the lvalue t[y].m[x]
directly allows the analyzer to be optimal
as long as the value of the plevel option allows it to,
but forcing the analyzer to represent the address
as the value of the variable p produces this set of offsets:
{{ &t + [428..11604],0%4 }}
This set of offsets contains pretty much the addresses of everything in t
,
including the p
pointer members,
so it appears when dereferencing p
that the result can be an address.
As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.
The environment variable TIS_KERNEL_MEMORY_FOOTPRINT
can be used to set
the size of caches used during the value analysis,
speeding up some slow analyses in which the caches were getting thrashed.
The default is 2 and each increment doubles the size of caches.
Only use this variable if you are not already low on memory.
Another useful option which helps in reducing the computation time is
-memexec-all
. If this option is set, when analyzing a function, the tool
tries to reuse results from the analysis of previous calls when possible.
It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.
First of all, the standard printf
function can be used to output constant
messages or messages involving only precise values. However, printing the
computed values when they are imprecise is not possible using printf
, and
instead the tis_show_each
function should be used. The name of this
function can be extended with any string, so that is is easier to make the
difference between different calls (as the full function name is printed each
time it is called). For instance:
tis_show_each_iteration (i, t[i]);
The statement above will output messages like this (one for each analyzed call):
[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])
Another useful function, which properly handles imprecise values, is
tis_print_subexps
. When given any number of expressions as arguments, it
will print the values of all the sub-expressions of each provided expression.
The first argument of this function must be always a string literal, which will
be printed out in order to help distinguish between different calls. For
instance:
tis_print_subexps ("simple sums", x + y, y + z, z + x);
Such a statement will output messages like this:
[value] Values of all the sub-expressions of simple sums (expr 1):
int x + y ∈ {3}
int x ∈ {1}
int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
int y + z ∈ {5}
int y ∈ {2}
int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
int z + x ∈ {4}
int x ∈ {1}
int z ∈ {3}
Moreover, tis_print_abstract_each
allows the contents of structured variables
to be observed. For instance:
tis_print_abstract_each(&i, &t);
This statement will output messages like this (one for each analyzed call):
[value] Called tis_print_abstract_each:
i ∈ {0; 1; 2; 3; 4}
t[0..4] ∈ [0..10]
[5..99] ∈ UNINITIALIZED
Note that, tis_print_abstract_each
in general takes addresses of variables
as parameters. This applies as well for an array, such as t
in the example
above. Contrary to popular belief, when t
is an array, &t
is not the
same as t
, and the user should call tis_print_abstract_each(&t);
to
see the whole array (pointer decay only shows the first element).
To get even more information, the tis_dump_each
function can be used
to print the whole state at the program point where it is called.
But it may be easier to call the tis_dump_each_file
function
to print the state in a file.
The name of the file is computed from the first argument of the call
(which must be a string literal), an incremented number,
and an optional directory given by the -val-dump-directory
option.
The -val-dump-destination
option allows to choose
which kind of output is expected among txt
or json
(all
for both, none
for no output).
For instance, when calling tis_dump_each_file ("state", *(p+i))
in a test.c
file, and analyzing it with the command:
$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp
these messages are shown in the trace:
test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'
The two generated files hold both the whole state computed the first time the
program point is reached and the possible values for *(p+i)
.
For instance, the JSON file may look like:
{
"file": "test.c",
"line": 11,
"args": "([0..10])",
"state": [
{
"base": "t",
"values": [
{ "offset": "[0..4]", "value": "[0..10]" },
{ "offset": "[5..9]", "value": "UNINITIALIZED" }
]
},
{ "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
]
}
To better understand the results, see the Value Analysis Data Representation section.
In order to avoid wasting time analyzing the application in a wrong context, the
analysis can be stopped as soon as some alarms are generated thanks to the
-val-stop-at-nth-alarm
option. With the argument equal to 1
, it aborts
the analysis at the first alarm. To ignore a certain number of alarms, the
argument can be increased. Although there is no strict relation between the
given argument and the number of alarms generated before the analysis stops
(i.e. these two values are not necessarily equal), one thing is guaranteed:
providing a larger number will lead to skipping more alarms.
The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:
$ kill -USR1 12345
The analyzer can also be stopped through the GUI (see Disconnect/Kill server).
When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.
The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:
$ tis-analyzer --timeout 5m -val ... -save project.state
Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:
tis_watch_cardinal
: stop the analysis when the number of different values
that a given memory location may possibly contain (because of imprecision) is
greater than a certain maximal amount.
tis_watch_value
: stop the analysis when a given memory location may
possibly contain a value of the provided set of forbidden values.
tis_watch_address
: stop the analysis when a given memory location may
possibly contain an address.
tis_watch_garbled
: stop the analysis when a given memory location may
possibly contain a garbled mix value.
tis_detect_imprecise_pointer
: stop the analysis when any expression is
evaluated to an imprecise pointer which contains a given base address.
These functions are available using #include <tis_builtin.h>
.
The arguments of the four tis_watch_*
functions follow the same logic:
The two first arguments define the memory location to watch: its address p
and its size s
.
The last argument n
is the number of statements during which the condition
may remain true before the analysis is stopped:
if n == -1
, the analysis never stops, but messages are printed
each time the condition is reached;
if n == 0
, the analysis stops as soon as the condition is reached;
if n > 0
, the analysis continues for the n-th first occurrences
where the condition is reached (and prints messages for each of them)
and stops at the next occurrence.
The meaning of the (potential) arguments in between varies depending on the particular function.
The function tis_detect_imprecise_pointer
only takes a pointer as argument.
Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:
int x = 0; /* the memory location to watch */
void *p = (void *)&x; /* its address */
size_t s = sizeof(x); /* its size */
int y[10];
/* The analysis stops when x is not exact (i.e. not a singleton value). */
int maximal_cardinal_allowed = 1;
tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);
/* The analysis stops the fourth time when x may be negative. */
int forbidden_values = tis_interval(INT_MIN, -1);
int exceptions = 3;
tis_watch_value(p, s, forbidden_values, exceptions);
/* The analysis stops when x may be an address. */
tis_watch_address(p, s, 0);
/* The analysis stops when x may be a garbled mix value. */
tis_watch_garbled(p, s, 0);
p = y;
/* The analysis starts to detect if an expression is evaluated to an
imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);
/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;
Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.
If you do not know how to add ACSL properties to your project, first read ACSL Properties.
Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.
Warning
Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.
Some examples are given below.
If source code modifications are needed, and the source code is in a git
repository, the tis-modifications
tool may be helpful to track them
in order to check that they are correctly guarded. See
tis-modifications Manual.
As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.
Example:
int T[10];
...
L1: if (0 <= x && x < y) {
...
L2: if (y == 10) {
L3: ... T[x] ...
}
...
}
When the T[x]
expression is encountered at the label L3, the analyzer tries
to check if the T
array is accessed correctly (i.e. inside the array
bounds), namely, is the (0 <= x < 10)
condition true. It already knows that
the (0 <= x)
part holds, due to the conditional statement at label L1
(assuming that x
has not been modified since then). Whatever values x
might have had before, at L1 they have been restrained to only non-negative ones
and this fact has been stored in the internal state (the interval of values for
x
was modified) and thus is visible at L3. For example, if before L1 the
value of x
was [--..--]
(i.e. nothing is known about x
except that
it is initialized), then after L1 it would be [0..--]
(i.e. the interval
spanning from zero to positive infinity).
Now, the analyzer still needs to verify if (x < 10)
also holds. This is
obvious for anybody who reads the code: the condition at L1 assures that (x <
y)
and the condition at L2 assures that (y == 10)
, therefore (x < 10)
must be true. Unfortunately, because of the limitations of the adopted value
representation method, the analyzer cannot deduce this by itself. The fact that
(x < y)
holds just cannot be expressed in the internal state (nor actually
any abstract relation between variables). And, supposing that the value of y
at L1 is [--..--]
, the (x < y)
condition does not help to restrain the
values of x
, its upper bound remains thus as before. Hence, this important
piece of information is lost and the analyzer simply cannot connect it with (y ==
10)
at L2 in order to correctly restrain the value of x
to [0..9]
. So
at L3 it will consider that the value of x
is [0..--]
and it will emit
an alarm about a potential out of bound access to the array T
.
To help the analyzer, the appropriate assertion can be added explicitly:
at L3: assert ax: x < 10;
Then the alarm will disappear. Of course the ax
assertion still needs to be
verified by other means. For example, this particular assertion can be easily
proven using WP (see Prove Annotations with WP).
State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:
at L1: assert ay: y <= 10 || 10 < y;
As explained before, the analyzer can store multiple memory states at each
program point (as explained in About the Value Analysis) and the maximal number of
states that can be stored per program point in the internal representation is
related to the precision level (i.e. slevel
). So, provided that the
-slevel
option has set the precision level high enough in order to permit
splitting the state here, the assertion above will lead to a case analysis:
The y <= 10
case: As y <= 10
is assumed, together with the x < y
condition at L1, it leads to deducing x < 10
, and this time this can be
represented in the internal state.
The 10 < y
case: If 10 < y
is assumed, then the condition at L2 is
false, and therefore the execution branch where the array T
is accessed
at L3 is not reached at all.
Thanks to introducing this assertion, the alarm will thus disappear. Moreover,
the analyzer is able to check on its own that ay
is always true, so there is
nothing more to verify.
It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.
Tip
Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).
In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.
For example, if the program contains a test (0 <= i+j < 10)
and then several
uses of T[i+j]
follow, it may be convenient to add a temporary variable
representing the i+j
sum:
int tmp = i + j;
if (tmp < 10) {
..
//@ assert a_tmp: tmp == i + j;
... T[tmp] ...
}
If neither i
nor j
are modified in the meantime, the assertion that
validates the code substitution should be trivial to verify, and the value
analysis is now able to know that (tmp < 10)
.
Beside assert
, requires
and ensures
, the loop invariant
properties are also useful to enhance the analysis precision.
For instance, this function generates an alarm:
int T[100];
//@ requires 0 <= n < 50;
void main(int n)
{
int i = 0;
while (i <= n) {
T[i] = 3;
i++;
}
T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;
This is because the value of i
is too imprecise when leaving the
loop, so the analysis doesn’t know if the access to T[i]
in the last
assignment is valid or not.
Adding this loop invariant remove the alarm:
/*@ loop invariant i <= 50; */
Moreover, the value analysis is able to check that this property is always valid.
A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.
Buffer problem: there is a relation between cur and len, but it cannot be represented.¶
A typical function to handle this buffer is:
void process (char * cur, size_t len) {
char * p = cur;
for (size_t i = 0 ; i < len ; i++, p++) {
*p = ...
}
}
The validity of the pointer p
has to be checked to avoid an alarm on
p
access, but also to get precise results later on.
It is especially important when the pointer points to an array that
is part of a larger structure. For instance:
struct data {
char buffer[BUFFER_LEN];
unsigned current_index;
unsigned state;
};
//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
char * current = data->buffer + data->current_index;
size_t length = BUFFER_LEN - data->current_index;
if (n > length) n = length;
process (current, n);
...
}
If the analysis is not able to know that p
does not go beyond the end of the
buffer
field, the value of the other fields current_index
and state
might be modified as well and might be too imprecise for the analysis
to give interesting results later on.
So the process
function needs a precondition
to give the constraint on cur
and len
to ensure the validity of the pointer.
This precondition could simply be:
//@ requires \valid (cur + 0 .. len-1);
Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:
if the data to reduce is the pointer:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
if the data to reduce is the length:
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);
Notice that the ACSL functions \base_addr
, \block_length
and
\offset
only provide the expected information when cur
is a pointer
to an array allocated on its own. If the array is a field in a structure,
\base_addr(cur)
returns the base address of the structure.
Structure with a .buf
field:
ACSL functions are related to the allocated block, not the internal array.¶
Anyway in some cases, even if the analyzer computes the optimal information,
cur
and len
both have an unknown value from intervals
and the relation between the two variables has been lost.
So the memory access to (*p)
raises an alarm when we cannot check that adding the upper bound of both
intervals is smaller than (buf + BUFFER_LEN)
.
Moreover, if buf
is in a structure as explained above,
buf
and BUFFER_LEN
may be unknown in the function.
A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:
/*@ requires f_r_buf: val: cur <= bound;
requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
char * p = cur;
//@ loop invariant f_l_p: val: p <= bound;
for (size_t i = 0 ; i < len ; i++, p++) {
if (p >= bound) break;
*p = ...
}
}
In the previous example, the call to process
would have to be changed to:
process_bounded (current, n, data->buffer + BUFFER_LEN);
As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.
Two annotated functions with such bounds, tis_memset_bounded
and
tis_memcpy_bounded
, are provided to be used instead of memset
and
memcpy
when this problem occurs with these libc functions.