Tip
Use your browser search engine (such as Ctrl+F) to search in the entire documentation.
Caution
The documentation describes all the features and tools available with TrustInSoft Analyzer.
The sections of the documentation describing the features or the tools that are not available in this version are explicitly marked.
TrustInSoft Analyzer is an award winning static C and C++ source code analyzer that ensures the safety and security of the source code by providing mathematical/formal guarantees over program properties.
TrustInSoft Analyzer takes advantage of state-of-the-art technology to provide different sets of tools to facilitate the analysis of C and C++ programs.
Structure of the Documentation
Get Started
Start with the Tutorials to learn how to use TrustInSoft Analyzer, browse the source code and the alarms in the GUI to eventually have the guarantee of the absence of Undefined Behavior.
TrustInSoft Analyzer: non-technical overview
This section provides step-by-step tutorials to help you get started with TrustInSoft Analyzer.
The goal of TrustInSoft Analyzer is to prevent runtime errors by analyzing all of the possible values that the variables can take at any given point in the program, in order to prove that none of the execution paths leads to a problem (such as an undefined behavior or a forbidden operation).
This verification is called the value analysis.
Note
Unlike testing or binary analysis, the value analysis provided by TrustInSoft Analyzer is exhaustive: the guarantees provided apply to all the concrete executions of the program. Even the tests with the best coverage will only test a few execution paths in a program, whereas binary analysis is strongly dependent on the compilation and execution environments. This is why static value analysis gives stronger guarantees than both testing and binary analysis.
The value analysis will try to prove all the properties needed for the code to be correct. If a property can not be automatically proved, the analyzer will emit an alarm, such as:
/*@ assert Value: division_by_zero: b ≢ 0; */
It means that, in order for the program to be correct, the value of b
needs to be non-zero at the point in the execution pointed by the analyzer.
At this point there are two possibilities:
There is an execution path for which b = 0
that will lead to an error
or an undefined behavior.
This means there is a bug in the program that needs to be corrected in order to ensure that the property will hold.
There is no execution path for which b = 0
, but the analyzer was not able
to prove the validity of the property.
This means that the analyzer over-approximates the possible values
of b
, and in order to make the alarm disappear, it will be
necessary to guide the analyzer to be more precise on the possible values
of b, and then run the analysis again.
Tip
An alarm is a property, expressed in a logic language, that needs to hold at a given point in the program in order for the program to be correct.
The following examples show how to use TrustInSoft Analyzer on test snippets to eventually guarantee the absence of undefined behavior for the input values of the test.
Tip
When used to verify tests, TrustInSoft Analyzer can be used with
the option --interpreter
so that no other special tuning is
required.
This allows the analyzer to remain precise, and thus, each alarm is a true bug that needs to be fixed.
The example array.c
, is located in the directory
/home/tis/1.43.3/C_Examples/value_tutorials/getting_started
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i <= 5; i++)
printf("array[%d] = %d\n", i, array[i]);
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Open the GUI after starting the analysis with the following command:
$ tis-analyzer-gui --interpreter /home/tis/1.43.3/C_Examples/value_tutorials/getting_started/array.c
TrustInSoft Analyzer emits the following alarm:
/*@ assert Value: index_bound: i < 5; */
Tip
Meaning of the property: for the program to be valid, the value of
i
must be strictly less than 5
when used to access
array[i]
.
To see both values of i
and array
, right click on i
and Track this term in the statement printf("array[%d] =
%d\n", i, array[i]);
highlighted in orange, then click on array
on that same statement. To see the values at each iteration of the
loop, click on Per Path and scroll down to the last raw.
The value of i
is successively equal to:
0
at the first iteration of the loop,1
at the second iteration of the loop,2
at the third iteration of the loop,3
at the fourth iteration of the loop,4
at the fifth iteration of the loop,5
at the sixth iteration of the loop,array
is an array local var. (int[5])
Accessing array[i]
when i
is equal to 5
is an access out
of the bounds.
In order to continue the analysis, the program must be fixed and the analysis run again.
Note
Indeed, not all statements are analyzed, see the statements highlighted in red.
The statements after the loop are highlighted in red because they are unreachable with respect to the input values of the program. This means that none of the computed value allows to continue after the loop with a well-defined execution of the program.
Understanding the Root Cause to Fix the Program
Tip
Now that we know that when i
is equal to 5
there is an
access out of the bounds of array
, we will ask the analyzer
where does this value come from.
Click on the button Inspect i to see the last write to the
left-value i
according to the current statement, and the value of
i
at this statement.
Tip
The button Inspect i is equivalent to the following actions:
i
and then Show Defsi
to see the valuesIn the Interactive Code, the two statements highlighted in
green show the last writes to i
, moreover, the Show
Defs panel shows that these two write locations are the only one. The
location i = 0
correspond to the declaration and initialization,
while the location i++
is where the value of i
is eventually
5
.
We fix the program by changing the conditional condition to exit the loop.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i < 5; i++)
printf("array[%d] = %d\n", i, array[i]);
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Kill the current analysis, fix the source code and then run again the analysis.
Tip
When analyzing real life project, and when the source code must be fixed, it is recommended to follow the process below:
However, for simple use case, such as this one, everything can be done from the GUI, following this process:
TrustInSoft Analyzer emits the same alarm, but at a different location:
/*@ assert Value: index_bound: i < 5; */
Tip
Meaning of the property: for the program to be valid, the value of
i
must be strictly less than 5
when used to access
array[i]
.
To see both values of i
and array
, right click on i
and Track this term, then click on array
.
The value of i
is equal to 5
, and array
is an array
local var. (int[5])
. Accessing array[i]
when i
is equal to
5
is an access out of the bounds.
Once again, we fix the program to print the last element of the array, kill the current analysis and then run again the analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i < 5; i++)
printf("array[%d] = %d\n", i, array[i]);
i--;
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Tip
TrustInSoft Analyzer guarantees the absence of undefined behavior for the input values of this test.
In this example we will study the program example1.c
, located in the
/home/tis/1.43.3/C_Examples/TutorialExamples
directory. This program performs some operations
on unsigned integers.
1 2 3 4 5 6 7 8 9 10 | //Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
int sum, prod, quot, diff;
sum = a + b;
diff = a - b;
prod = a * b;
quot = a / b;
}
|
Start the value analysis (option -val
)
with the following command:
$ tis-analyzer-gui -val /home/tis/1.43.3/C_Examples/TutorialExamples/example1.c
and launch the GUI (as explained in the Getting Started section).
Your browser window should display a division by zero
alarm:
Selecting the alarm in the bottom panel will highlight the program point at which the alarm was raised, both in the Source Code Window (right panel) and in the Interactive Code Window, where you can also see the ACSL assertion generated by the analyzer:
/*@ assert Value: division_by_zero: b ≢ 0; */
The program takes two unsigned integer arguments a
and b
that are unknown at the
time of the analysis. The value analysis must ensure that the program is correct
for all possible values of a
and b
. The alarm was raised because if
b
is equal to zero, there will be a division by zero
, which leads to
undefined behavior. This means that anything can happen at this point, making
the program incorrect (and dangerous to use, as this will lead to a runtime error).
Let’s modify the program by adding an if (b != 0)
statement in order to perform the
division only if b
is non-zero.
1 2 3 4 5 6 7 8 9 10 11 | //Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
int sum, prod, quot, diff;
sum = a + b;
diff = a - b;
prod = a * b;
if (b != 0)
quot = a / b;
}
|
and launch the analysis on the modified version example1_ok.c
:
$ tis-analyzer-gui -val /home/tis/1.43.3/C_Examples/TutorialExamples/example1_ok.c
Congratulations! All the alarms have disappeared, so the program is guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer. This means that it will run safely, whatever the arguments provided.
You can now move to the next series of examples.
Compiling well defined, standard-compliant C or C++ source code should ensure that your program will behave as intended when executed, whatever the platform or compilation chain used.
TrustInSoft Analyzer’s static code analysis helps you produce high quality, standard-compliant code that is guaranteed to be free from a wide range of weaknesses and vulnerabilities.
The example we will examine shows how TrustInSoft Analyzer can detect dangling pointers (also known as CWE-562:Return of Stack Variable Address or CWE-416: Use After Free in the Common Weaknesses Enumeration).
This example originally appeared in a blog post by Julien Cretin and Pascal Cuoq on TrustInSoft website. We encourage you to read the whole post for more technical details and some examples of production code carrying this kind of bug.
Let’s have a look at the program example3.c
, located in the
/home/tis/1.43.3/C_Examples/TutorialExamples
directory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | /* TrustInSoft Analyzer Tutorial - Example 3 */
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
{
char a = 3;
p = &a;
pv = (uintptr_t)p;
}
{
char b = 4;
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
return 0;
}
|
This program prints out some text. The result should differ depending
on the value of the expression p == q
in line 24.
Start the value analysis with the following command:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.43.3/C_Examples/TutorialExamples/example3.c
and launch the GUI (as explained in the Getting Started section).
After selecting the Properties widget and the Only alarms button, you will notice that there is one alarm and some lines of dead code right after the alarm:
The assertion generated shows that the analyzer has found a dangling pointer (a pointer not pointing to a valid object):
/*@ assert Value: dangling_pointer: ¬\dangling(&p); */
Let’s use the analyzer to follow the variable p
through the
execution, and try to find the source of the problem:
The values of the variable p
before and after the selected
statement will be listed in the Values widget in the bottom
panel. We can see that, after the initialization p = &a
, the
variable p
holds the address of a
as expected.
Selecting the expression p == q
inside the if
will show the
value of p
at this point in the program. Before the evaluation of
the p == q
expression p
is shown as ESCAPINGADDR
, meaning
that it does not hold the address of a valid object anymore.
The reason is quite simple: p
holds the address of the local
variable a
, whose lifetime is limited to the block in which the
variable is defined:
{
char a = 3;
p = &a;
pv = (uintptr_t) p;
}
Note
As stated by clause 6.2.4:2 of the C11 standard,
If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
The pointer p
refers to the variable a
outside of its
lifetime, resulting in undefined behavior. As every execution path
results in an undefined behavior, the analysis stops.
When the behavior of a program is undefined, it means that anything can happen at execution time. Some programmers seem to think that, at least in some cases, this is not a big issue: they are certainly wrong. A program invoking undefined behavior is a time bomb.
A dangling pointer is an example of undefined behavior. Let’s illustrate its consequences on our example.
First we compile the program with gcc
:
$ gcc -Wall -Wextra -Wpedantic /home/tis/1.43.3/C_Examples/TutorialExamples/example3.c -o example3
and notice that, despite all of our efforts, gcc
does not issue any warnings.
Running the code will display the following text:
$./example3
Roses are red,
Violets are blue,
This poem is lame,
It doesn't even rhyme.
meaning that the condition p == q
evaluated to true. This can
happen if the variables a
and b
were allocated at the same
address by the compiler (which is possible since they are never in
scope at the same time).
Using different compilation options should not affect the behavior of
the program, but compiling with a -O2
switch and running the
program results in a different output:
$ gcc -Wall -Wextra -Wpedantic -O2 /home/tis/1.43.3/C_Examples/TutorialExamples/example3.c -o example3_opt
$./example3_opt
Roses are red,
Violets are blue,
0x7ffc9224f27e is different from 0x7ffc9224f27f
7ffc9224f27e is not the same as 7ffc9224f27f
This time the expression p == q
evaluated to false because the
variables a
and b
were allocated to different addresses. So
changing the optimization level changed the behavior of our program.
In the aforementioned post you can
see evidence of a third, very weird behavior, in which p == q
evaluates to false, even though a
and b
are allocated to the
same address.
The conclusion is clear, and we will state it as a general warning:
Warning
If the behavior of your program is undefined, executing the compiled code will have unpredictable results and will very likely cause runtime errors. You should always ensure that your code is well-defined, using source code analysis or other techniques.
We have already seen how TrustInSoft Analyzer is able to find dangling pointers and other weaknesses. We will continue the analysis and correct the code in order to guarantee that there are no problems left.
To avoid the dangling pointer problem, we will define a
outside
the block, so that its storage and address are guaranteed by the
standard throughout the whole main
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | //TrustInSoft Analyzer Tutorial - Example 3_1
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
char a = 3;
{
p = &a;
pv = (uintptr_t)p;
}
{
char b = 4;
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
}
|
and launch the value analysis again:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.43.3/C_Examples/TutorialExamples/example3_1.c
We notice that the dangling pointer alarm regarding p
has been
replaced by the same alarm about q
. When evaluating the expression
p == q
the analyzer noticed a problem with the term p
and
stopped the analysis, so it did not get a chance to issue the alarm
about q
. Now that we corrected the first problem, the analyzer
gets to the term q
and raises the same kind of alarm.
Selecting the variables p
and q
in the Values widget tab
and clicking on the p == q
expression will show that, before
evaluation, p
holds the address of a
whereas q
has the
value ESCAPINGADDR
. This means that q
is a dangling pointer,
as it references the address of the out-of-scope variable b
:
To correct the problem we will simply define b
outside the block,
as we did before with a
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | //TrustInSoft Analyzer Tutorial - Example 3_ok
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
char a = 3;
char b = 4;
{
p = &a;
pv = (uintptr_t)p;
}
{
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
}
|
and launch the analyzer one more time to verify that there are no other alarms:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.43.3/C_Examples/TutorialExamples/example3_ok.c
Notice that the
printf("This poem is lame,\nIt doesn\'t even rhyme.\n");
code is marked as dead. Indeed the variables a
and b
need to
be allocated to different addresses, as they are in scope at the same
time. As a consequence, the condition p == q
evaluates always to
false, so the first branch of the if
statement will never be
executed.
Congratulations! You have successfully corrected all the bugs in
example3.c
. The program is now guaranteed not to be subject to
any of the weaknesses covered by TrustInSoft Analyzer.
In this example we will analyze an implementation of Skein, a cryptographic hash algorithm that was a finalist in the NIST SHA-3 competition.
We will show how to use TrustInSoft Analyzer to:
Tutorial Overview
In the first part of this tutorial we will use TrustInSoft Analyzer to explore the code and then launch a value analysis on the Skein implementation.
The estimated time for this lesson is less than 20 minutes.
Unlike our previous examples, the Skein implementation is composed of
multiple files. All the files needed for this tutorial are located in
the /home/tis/1.43.3/C_Examples/skein_verification
directory.
Let’ start by listing all the files in the directory:
$ ls -l /home/tis/1.43.3/C_Examples/skein_verification
total 124
-rw-rw-r-- 1 tis tis 204 Oct 5 18:54 README
-rwxrwxr-x 1 tis tis 4984 Oct 5 18:54 SHA3api_ref.c
-rwxrwxr-x 1 tis tis 2001 Oct 5 18:54 SHA3api_ref.h
-rwxrwxr-x 1 tis tis 6141 Oct 5 18:54 brg_endian.h
-rwxrwxr-x 1 tis tis 6921 Oct 5 18:54 brg_types.h
-rw-rw-r-- 1 tis tis 524 Oct 5 18:54 main.c
-rwxrwxr-x 1 tis tis 34990 Oct 5 18:54 skein.c
-rwxrwxr-x 1 tis tis 16290 Oct 5 18:54 skein.h
-rwxrwxr-x 1 tis tis 18548 Oct 5 18:54 skein_block.c
-rwxrwxr-x 1 tis tis 7807 Oct 5 18:54 skein_debug.c
-rwxrwxr-x 1 tis tis 2646 Oct 5 18:54 skein_debug.h
-rwxrwxr-x 1 tis tis 1688 Oct 5 18:54 skein_port.h
Note
Please note that the README and the main.c
files are not part
of the Skein implementation. They were added for use with this
tutorial.
The skein.h
file is a good starting point to explore the API. We
will focus our attention on the following lines of code:
typedef struct
{
size_t hashBitLen; /* size of hash result, in bits */
size_t bCnt; /* current byte count in buffer b[] */
u64b_t T[SKEIN_MODIFIER_WORDS]; /* tweak words: T[0]=byte cnt, T[1]=flags */
} Skein_Ctxt_Hdr_t;
typedef struct /* 256-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN_256_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN_256_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein_256_Ctxt_t;
typedef struct /* 512-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN_512_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN_512_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein_512_Ctxt_t;
typedef struct /* 1024-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN1024_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN1024_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein1024_Ctxt_t;
/* Skein APIs for (incremental) "straight hashing" */
int Skein_256_Init (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
int Skein_512_Init (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
int Skein1024_Init (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
int Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
int Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
It seems that, in order to hash a message, we’ll need to:
Skein_256_Ctxt_t
;Skein_256_Init
;Skein_256_Update
a representation of the string;Skein_256_Final
with the address of a buffer in order to write the hash value.When confronted with the analysis of an application, the usual entry
point for the analysis is its main
function. Nonetheless, there
are many contexts in which there will not be an obvious entry-point,
for example when dealing with a library. In those cases, you will need
to write a driver to test the code, or better, leverage existing tests
in order to exercise the code.
As the Skein implementation includes no tests, we provide the file
main.c
as a test driver. It implements the steps outlined above in
order to hash the message "People of Earth, your attention,
please"
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | /* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"
#define HASHLEN (8)
u08b_t msg[80] = "People of Earth, your attention, please";
int main(void)
{
u08b_t hash[HASHLEN];
int i;
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&skein_context, HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
return 0;
}
|
The driver also prints the contents of the hashed message in the
for
loop at the end of the file. Each of the printed values
corresponds to the numerical value of the corresponding character in
the hashed message.
We compile and run the code by executing the command:
$ gcc /home/tis/1.43.3/C_Examples/skein_verification/*.c && ./a.out
and get the following output:
215
78
61
246
0
0
0
0
Note
You might get different output because, as we will see, there is a bug in the test file.
We will start by launching a simple value analysis with the following command:
$ tis-analyzer-gui -64 -val /home/tis/1.43.3/C_Examples/skein_verification/*.c
Note
The -64 option specifies that we are using a 64 bit architecture. Other possible values are -16 and -32. If no value is specified, the default target architecture will be 32 bit.
Warning
For the following analysis it is extremely important to provide the correct architecture. Please make sure you are providing the correct value.
Next, we will launch the GUI by opening the link or pointing the browser to http://localhost:8080 (the port number may change if you have other analysis running at the same time).
Once you have opened the GUI, go to the Properties widget and click on the only alarms button:
In the Properties widget we can see that there are about 20 alarms.
Note
If the analyzer finds that an error occurs for every possible execution path, then it will stop the analysis. If a problem occurs in one possible execution path, the analyzer will raise an alarm and continue the analysis on the other execution paths. This is why you can get multiple alarms.
As the analyzer is not precise enough, we will start by improving the precision of the analysis in order to reduce the number alarms. We will show how to do this in the next part of this tutorial.
Please click here to continue the tutorial.
In this second part of the Skein tutorial, we will fine-tune the value analysis and investigate the alarms raised by the analyzer in order to identify and correct bugs.
The estimated time for this lesson is 20 minutes.
Note
As we have seen in previous examples, the number of different
states that can be propagated at each program point is limited by
the semantic unrolling level, which can be tuned with the
-slevel
option. The default configuration will allow only one
single state, so the analysis can become imprecise
quickly. Activating the -slevel
option will allow to unroll
loops and separately propagate the branches of conditional
statements, thus improving the precision of the analysis in most
cases.
A first step to improve the precision of the analysis is to activate
the -slevel
option. We will try to launch the analysis with an
slevel
value of 100 and see if there are any improvements:
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.43.3/C_Examples/skein_verification/*.c
Tip
100
is a good default value for the slevel
because most
functions terminate with less than 100 disjoint states. If the
slevel
is not enough, you will be able to fine-tune it later, but keep in mind that increasing the
slevel
can lead to slower analysis, so you will need to find
a good compromise between precision and speed.
The GUI will give you information about the slevel
consumed at different points in the program, making it easier to
find the right value to fit your needs.
We open the GUI again
and notice that there is only one alarm left, so all the other alarms were false alarms.
Note
Unlike many other static analysis tools, TrustInSoft Analyzer never remains silent when there is a potential risk for a runtime error. As we have seen before, when the analysis is not precise enough it can produce false alarms.
On the other hand, the absence of alarms means that you have mathematical guarantees over the absence of all the flaws covered by TrustInSoft Analyzer.
With only one alarm left, we can investigate its cause to see if it is a true positive or not.
Notice that there is some dead code that cannot be reached by any execution. In particular, this function never terminates because the return statement cannot be reached.
This means that the execution never leaves the for
loop in line 19:
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
Clicking on the alarm in the Properties widget, we see that the alarm occurs indeed inside the loop. The annotation generated by the analyzer is:
/*@ assert Value: initialisation: \initialized(&hash[i]); */
meaning that we are trying to read a value that might not be properly initialized.
Note
The for
loop in the main.c
program has been transformed in
a while
loop by the analyzer. This is part of a normalization
process that transforms the program into a semantically equivalent
one in order to facilitate the analysis.
Let’s explore the values inside the hash
array by going to the
Values widget and clicking on the hash
term in the
Interactive Code Window.
By clicking on the values in the before column of the Values widget,
you will have a more readable output of the contents of the hash
array:
[0] ∈ {215}
[1..7] ∈ UNINITIALIZED
This begins like: "�"
The analyzer is telling us that, before entering the loop, hash[0]
contains the value 215
(which it tries to interpret as
a character, which gives the This begins like: "�")
and all the
other items in the array are UNINITIALIZED
.
Reading an uninitialized value is an undefined behavior, and that is exactly what the program is doing when trying to read past the first element of the array. The analysis stops here as all the execution paths lead to undefined behavior.
It seems that the alarm is indeed a true positive, so we cannot go further in the analysis without correcting the problem.
As we have uninitialized values in the array, let’s have a look at the
initialization function Skein_256_Init
.
You can navigate to the definition of the function by right clicking on the function name:
This will update the Interactive Code Window with the code of the function:
After clicking on the hashBitLen
variable, we can see in the
Properties widget that its value is always 8
.
This value corresponds to the value of the HASHLEN
macro that is
passed to Skein_256_Init
function in line 15 of the original code:
Skein_256_Init(&skein_context, HASHLEN);
Note that in the Interactive Code Window this macro has been expanded to 8
.
So the length of the hash is expressed in bits, and we were inadvertently asking for an 8 bit hash. This explains why only the first element of the array was initialized.
As we wanted an 8 byte hash, we can correct the problem by
multiplying the value by 8 in the call to Skein_256_Init
,
modifying line 15 of the main.c
file to look like this:
Skein_256_Init(&skein_context, 8 * HASHLEN);
After saving the changes we run TrustInSoft Analyzer again:
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.43.3/C_Examples/skein_verification/*.c
and open the GUI:
Congratulations! There are no more alarms, so our program is guaranteed to be free from all the kinds of bugs covered by TrustInSoft Analyzer.
We can see the value of the hash in the shell in which we executed TrustInSoft Analyzer:
Let’s compile and run the program again to check that the results are the same:
$ gcc /home/tis/1.43.3/C_Examples/skein_verification/*.c && ./a.out
We get the following output, which is indeed the same that we get on the analyzer
224
56
146
251
183
62
26
48
Note
The results we got when compiling the program in the first part of this tutorial were different because we were compiling an ill-defined program (as it was reading uninitialized variables, which is an undefined behavior). When the behavior of a program is undefined, you can get anything as a result (in our case it was most probably garbage read from the memory, as we were accessing uninitialized values).
As the program is now guaranteed to be correct, the output should be the same on any standard-compliant installation.
Our driver tested the Skein implementation on a single message of length 80. We could modify it by testing a number of different messages, and thus gain better coverage, as normal unit tests will do. But what if we could test it on all the messages of length 80?
This is certainly impossible to achieve by the execution of any compiled test, but it is in the scope of TrustInSoft Analyzer.
In the next part of this tutorial we will show you how to generalize the test to arbitrary messages of fixed length. This will give mathematical guarantees about the behavior of the implementation on any message of the fixed length.
Please click here to continue the tutorial.
The previous analysis
allowed us to find a bug on the test driver. After correcting the
program, the analyzer did not raise any alarms, which guaranteed the
absence of a large class of bugs when hashing the 80
char message “People of Earth, your attention, please
”.
In this third part of the Skein tutorial, we will generalize the result obtained in the second part to all the messages of length 80.
The estimated time for this lesson is 20 minutes.
Testing a program on a given set of test cases can allow you to identify some bugs, but it can not guarantee their absence. This is true even if you test your program on a huge amount of cases, because it is very unlikely that you will be able to cover them all.
Even in a simple example like ours (hashing a message of length 80), there are way too many test cases to consider (see the note below to get some detailed calculations).
Furthermore, even if your cases pass the test, this gives you no guarantee about future behavior: for example, if a test case results in an undefined behavior this might pass unnoticed in the test environment but trigger a bug in your client’s machine.
Note
Although the number of distinct 80 chars arrays is certainly
finite, there are 28 = 256 possible values for a char
variable so there are 25680 ≈ 4.6 x 10192
different arrays of 80 chars. As the number of elementary
particles in the visible universe is considered to be less than
1097, we can conclude that the number of arrays of 80
chars is definitely out of scope for an exhaustive test.
The value analysis performed by TrustInSoft Analyzer uses abstract interpretation techniques in order to represent the values of the terms in the program. This allows for the representation of very large or even infinite sets of values in a way that makes sense to the analyzer. That is why we are going to be able to consider all the possible values for a given variable.
If the value analysis raises no alarms, this means that you have mathematical guarantees about the correctness of the program. These guarantees are not limited to one particular execution, but extend to any execution in a standard-compliant environment (which respects all the hypothesis made during the analysis).
Note
We will use some primitives that have a meaning in the abstract state of the analyzer, but that make no sense for a compiler. For this reason, we would not be able to compile or execute the driver anymore.
One such function is tis_interval
. The assignment x =
tis_interval(a, b)
tells the analyzer that the variable x
can
hold any value between a
and b
(both included).
Our goal is to obtain a result that is valid for any given array of
length 80. Let’s start by modifying the main.c
program in order to
tell the analyzer that each of the elements in the msg
array can
take any value between 0
and 255
.
To accomplish this, it suffices to add a couple of lines to the main.c
program
for(i = 0; i < 80; i++)
msg[i] = tis_interval(0, 255);
to make it look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"
#define HASHLEN (8)
u08b_t msg[80];
int main(void)
{
u08b_t hash[HASHLEN];
int i;
for(i = 0; i < 80; i++)
msg[i] = tis_interval(0, 255);
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&skein_context, 8 * HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
return 0;
}
|
We then launch the value analysis again
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.43.3/C_Examples/skein_verification/*.c
and open the GUI:
There are no alarms in the GUI. This means that we have mathematical guarantees that the code is free from all the bugs covered by TrustInSoft Analyzer.
Congratulations! You have just proved that initializing, parsing, and hashing a message of length 80 by calling the three functions
Skein_256_Init(&skein_context, 8 * HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
will cause no runtime errors, whatever the contents of the 80 char array msg
.
Note
As we saw before, this kind of result would be impossible to obtain with case testing, as this would amount to write a program testing each one of the 25680 possible input messages.
We will end this part of the tutorial with a few words on the representation of values by the analyzer. At each program point, the analyzer determines a variation domain for each variable or expression. The value of the variable or expression is always guaranteed to be contained in the variation domain but, due to the possible loss of precision, these domains can be over-approximations.
Note
The analyzer can represent the variation domain of an integer variable in three different ways:
[l..u]
with lower bound l
and upper bound u
.[l..u], r%m
, that
is the set of values between the lower bound l
and the upper
bound u
, whose remainder in the Euclidean division by m
is equal to r
.A --
represents the smallest value that fits within the type of
a variable if it occurs in the lower bound and the biggest value if
it occurs in the upper bound. For instance, [--..--]
will mean
[0..255]
for a variable of type unsigned char
.
If you go to the Values widget in the GUI and click on the
msg
variable in the Interactive Code Window (as shown below),
you will be able to see how the analyzer represents integer values
contained in the array msg
of the program main.c
:
The analyzer shows that, after the initialization loop, msg[0..79] ∈
[--..--]
. This means that each of the values msg[0]
to
msg[79]
can take an arbitrary value in the interval
[--..--]
. As pointed before, for an unsigned char [--..--]
stands for any value that fits within the type, that is between
0 and 255.
This is exactly what was achieved by calling the function tis_interval(0, 255)
.
If the C++ front-end for TrustInSoft Analyzer is available,
it can be used with the command tis-analyzer++
. This command accepts the
same arguments as tis-analyzer
as well as specific options that
can be listed using the command tis-analyzer++ -cxx-h
. An
interpreter version is also available using the command
tis-analyzer++ --interpreter
.
This tutorial will show you some C++ specific features of
TrustInSoft Analyzer in order to better understand its output and help
tracking the origin of alarms. It should be read after the
Getting Started section. The examples used in this tutorial can be
found in the /home/tis/1.43.3/Cxx_Examples
directory.
We will start by analyzing the program matrix.cpp
, which is a test
over a matrix manipulation library:
int
main(void) {
Matrix<2U, 2U> matrix_a {
2., 1.,
4., 2. };
auto id = identity<2>();
bool has_inverse = is_invertible(id);
std::cout << "identity is inversible: " << (has_inverse ? "yes\n" : "no\n");
Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
Matrix<2, 1> res = solve(matrix_b, { 6., 10. });
std::cout << "RESULT IS:\n" << res;
return 0;
(void) has_inverse;
}
Start the analysis with the following command:
$ tis-analyzer++ --interpreter -gui /home/tis/1.43.3/Cxx_Examples
/matrix.cpp
and open the GUI. The Interactive Code Window should look like:
The first thing to notice is that some names contain characters that
are forbidden in C, like Matrix<2, 2>
or std::__1::ostream
,
and may be prefixed by ...
. The names of entities in tis-analyzer++
are
actually mangled, the Interactive Code Window displays an
unmangled version of them to be clearer. The mangled version of names
can be viewed by using the option -cxx-keep-mangling
, and the
mangling used is close enough to existing compiler practice to be
unmangled by external tools like c++filt
.
When a name is long, a shortened version of it is displayed in the
Interactive Code Window with ...
as prefix. Clicking on this
prefix will display the full qualified name, or its mangled version if
the option -cxx-keep-mangling
is used.
The first statement that is not a declaration is a call to the
function __tis_globinit()
. This function represents the dynamic
initialization phase of a C++ program [1]. It contains
only calls to functions with names similar to X::Y::__tis_init_Z
,
that are used to initialize the non-local variables X::Y::Z
.
Looking at the definition of the X::Y::__tis_init_Z
function will
lead the Source Code Window to the body of the generated
function initializing the variable X::Y::Z
The first statement of the main
function in the original code is:
Matrix<2U, 2U> matrix_a {
2., 1.,
4., 2. };
and corresponds in the normalized code to the line:
Matrix<2, 2>::Ctor<double, double, double, double>(& matrix_a,2.,1.,4.,2.);
Ctor
is the generic name that TrustInSoft Analyzer assigns to C++
constructors. You can see that:
& matrix_a
.All method calls are translated to regular C function calls, and as
such they receive an additional argument which stands for the this
pointer. In case of constructors, this is the address of the object
being initialized.
When looking at the constructor definition, you can see that it is
calling the inherited constructor Matrix_base<2, 2>::Ctor<double,
double, double, double>
with the same arguments, except that the
this
pointer is shifted to its __parent__Matrix_base<2, 2,
Matrix>
. The corresponding part of the original code is:
const Matrix_base<I, J, Parent> &m1,
const Matrix_base<J, K, Parent> &m2)
{
auto cross_product =
[&m1, &m2] (unsigned i, unsigned j) -> double
{
Matrix<N, M>
inherits from Matrix_base<N, M, Matrix>
, and its
constructor only transfers its arguments to the constructor of the
parent class. In tis-analyzer++
, a class A
inheriting from a class
B
is represented by a struct A
containing a field struct B
__parent__B
. The initialization of the base B
of A
is
translated into a call the function B::Ctor(&A.__parent__B)
. This
structure layout can be observed in the example by looking at the
definition of the type struct Matrix<2, 2>
.
The next statement of the main
function in the original code is:
auto id = identity<2>();
and corresponds in the normalized code to the line
identity<2>(& id);
The first thing to note here is that the id
variable has an
auto
type in the original source but is declared in the normalized
code as:
struct Matrix<2, 2> id;
tis-analyzer++
makes auto
types explicit, in the same way it
instantiates template parameters.
Another difference is that in the normalized code the identity<2>
function takes an additional argument despite being a usual function
and not a method. This is a consequence of the fact that, in C++, a
non-POD [2] value returned by a function may not live inside the
function but inside its caller. To model this, a function returning a
non-POD type receives an additional parameter which contains the
address of the initialized object.
The next statement of the main
function in the original code is:
bool has_inverse = is_invertible(id);
which, when clicked on, corresponds in the normalized code to:
{
struct Matrix<2, 2> __tis_arg;
_Bool tmp;
{
{
Matrix<2, 2>::Ctor(& __tis_arg,(struct Matrix<2, 2> const *)(& id));
tmp = is_invertible<2>(& __tis_arg);
}
has_inverse = tmp;
}
}
In this case, one C++ statement is translated into a block containing
multiple declarations and statements. The function
is_invertible<2>
takes its argument by copy, as seen in its
declaration:
template<unsigned N>
bool
is_invertible(Matrix <N, N> m)
and so its parameter has to be initialized with a new object. This is
the purpose of the __tis_arg_*
family variables. In the current
case, __tis_arg
is initialized by calling the copy constructor of
Matrix<2, 2>
with the address of id
as source. Then, the
address of newly built __tis_arg
variable is given to the function
is_invertible<2>
and the block around it delimits the lifetime
of __tis_arg
. This is the semantics of passing arguments by copy
[3].
This transformation does not happen when calling the copy constructor
of Matrix<2, 2>
because its argument is a reference. References
are converted to pointers, so taking a reference to an object is
taking its address, and accessing a reference simply dereferences the
pointer.
The next interesting statement is the one at line 37 of the original source:
Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
which is translated in the normalized code as:
{
struct Matrix<2, 2> __tis_tmp_61;
{
{
operator^(& __tis_tmp_61,(double)5,
(struct Matrix<2, 2> const *)(& id));
}
;
;
}
operator+(& matrix_b,
(struct Matrix_base<2, 2, Matrix> const *)(& matrix_a.__parent__Matrix_base<2, 2, Matrix>),
(struct Matrix_base<2, 2, Matrix> const *)(& __tis_tmp_61.__parent__Matrix_base<2, 2, Matrix>));
}
Again, this statement is decomposed into a block containing multiple
statements but declaring this time a variable called __tis_tmp_62
.
The __tis_tmp_*
family of variables correspond to temporary object
[4] that can be introduced by complex expressions. This
temporary object is declared inside the block as its lifetime is the
one of the full expression, and has to be destroyed at its end if
needed.
[1] | as stated in [basic.start.init]. |
[2] | POD in the sense of [class]p10. |
[3] | See [expr.call]p4 and in particular: “The initialization and destruction of each parameter occurs within the context of the calling function.” |
[4] | as defined in [class.temporary]. |
This section describes how to use TrustInSoft tools.
main
function to be used as an entry point of the analysis when the
original main
function uses command line arguments.The aim of this document is to help the user to study a C or C++ application with TrustInSoft Analyzer.
The main goal of this kind of study is to verify that none of the detected software weaknesses listed in the CWE-subset are present in the application. This document explains step by step how to achieve that goal.
The definitions of some of the words and expressions used in this document are given in the Glossary.
The main steps of an analysis are:
Here and there, extracting information, either to understand the results or to produce a report, is also useful. This can be done by combining options and scripts. How to do it is also explained in Get the Information.
Although the tool can be used purely from the command line interface, it also provides a GUI (see the TIS GUI Manual) that is very convenient for exploring the computed results.
The purpose of this chapter is to explain how to prepare the source code of an application before starting to analyze it. The main steps to perform are:
Finding out the preprocessing options.
This step can either be manual (Manual Preparation) or automatic (Automatic Preparation).
The manual preparation is the easiest way to start with if you already know the commands necessary to compile the source files. Otherwise, start instead with the automatic preparation.
Dealing with the external libraries.
In the simplest cases, all the source files need the same preprocessing command.
The default preprocessing command of tis-analyzer
is:
clang -C -E -nostdinc -isystem TIS_KERNEL_SHARE/libc
Some more options can be added to this command
with the -cpp-extra-args
option.
The whole command can also be specified directly with the -cpp-command
option, for instance in order to use another preprocessor.
The -I
and -isystem
(to add include paths), -D
(to add
macro definitions), and -U
(to remove macro definitions) options
are provided as shortcuts to the -cpp-extra-args
option.
For example, the following command can be used to run the analyzer on the
f1.c
, f2.c
, and f3.c
source files,
taking the included files from the incl_dir
directory:
$ tis-analyzer -I incl_dir f1.c f2.c f3.c
A specific preprocessing command can be given to a set of specific
files with the option -cpp-command-file "f1.c:clang -C
-E,f2.c:gcc -C -E"
(or -cpp-command-file "f1.c:clang -C
-E" -cpp-command-file "f2.c:gcc -C -E"
. More options can be added to
a preprocessing command for a set of files in the same way with the option
-cpp-extra-args-file "f1.c:-Idir/"
.
Any file not listed in -cpp-command-file
(resp. -cpp-extra-args-file
) will use the global command
(resp. additional options) of the -cpp-command
option
(resp. -cpp-extra-args
option).
If most of the source files need to have a specific preprocessing command, it is recommended to use the Automatic Preparation.
The exact pre-processing command in use can be shown by adding the command line
option -kernel-msg-key pp
when running the analyzer.
In some applications, the source code is split in modules that require different preprocessing commands.
Warning
First of all, an important recommendation is to tackle the software in as small-sized chunks as possible. This makes the most of pre-processing problems go away.
If a particular source file needs a different preprocessing command, it is
better to preprocess it first. The result file has to be named with a .i
or .ci
extension so the analyzer knows that it does not need to preprocess it.
The difference between the two extensions is that the .i
files are not
preprocessed at all by the tool, whether the macro definitions are expanded in
the annotations of the .ci
files, which is most of the time the intended
behavior. So except in some special cases, the .ci
extension has to be
preferred.
Source files and preprocessed files can be mixed in the command line. For
instance, if the f3.c
file needs some special options, f3.ci
can be
generated beforehand, and then used in the command line:
$ tis-analyzer -I incl_dir f1.c f2.c f3.ci
This will give the same result as the previous command, provided that
f3.c
has already been preprocessed into f3.ci
.
Here is a synthetic example with two files h1.c
and h2.c
that use the same macro M
which needs to have a different definition in each
file.
h1.c
:¶ int x = M;
extern int y;
int main(void) {
return x + y;
}
h2.c
:¶ int y = M;
If M
is supposed to be 1 in h1.c
and 2 in h2.c
the recommended command lines for this example are:
$ clang -C -E -nostdinc -DM=1 -o h1.tis.ci h1.c
$ clang -C -E -nostdinc -DM=2 -o h2.tis.ci h2.c
Then, the generated files can be provided to the analyzer:
$ tis-analyzer -val h1.tis.ci h2.tis.ci
And the obtained result shows that M
has been correctly expanded:
...
[value] Values at end of function main:
__retres ∈ {3}
...
In more complex cases, it is better to use the Automatic Preparation.
Most applications use some libraries, at least the standard libc
. The
analyzer needs to have information about the functions that are used by the
application, at least the ones that are called in the part of it which is being
studied.
For the libc
library, some header files come with the tool and provide
specifications to many of its functions. These header files are included by
default when preprocessing source files. However, if the preprocessing is done
before, the following option has to be employed in order to find the
instrumented files:
-I$(tis-analyzer -print-share-path)/libc
The tool also provides implementations to some libc functions
that are automatically loaded.
They are either C source code or internal built-in functions.
But the -no-tis-libc
option may be used to completely ignore the tool’s
library functions and header files.
It can be useful when analyzing code with custom libc functions for instance.
Another intermediate solution is to use the --custom-libc <file>
option.
In that case, the given source file is analyzed before the tool runtime files.
It gives the opportunity to overload some of the provided C implementations.
The built-in functions cannot be individually be overloaded at the moment.
To overload some header files in case something is missing,
the --custom-isystem <path>
option can be used.
Then the given include path is used before the tool ones.
In that case, the custom headers xxx.h
may include the tool headers with:
<path>/xxx.h
:¶ #include <tis-kernel/libc/xxx.h>
// some more declarations and/or specification for <xxx.h>
If other external functions are used, one has to provide some properties concerning each of them: at the minimum to specify which pieces of data can be modified by them. See Check the External Functions to know which functions have to be specified and Write a Specification to learn how to do it.
At this point, the source files and the preprocessing commands should have been retrieved. It is time to try the tool for the first time, for instance by running:
tis-analyzer -metrics <..source and preprocessed files..> <..preprocessing options>
The preprocessing options are only used when source files are provided. In complex cases, it can be easier to analyze only the already preprocessed files.
This section describes how to automatically produce
a compile_commands.json
file that contains instructions on how to
replay the compilation process independently of the build system.
A compilation database is a JSON file, which consist of an array of “command objects”, where each command object specifies one way a translation unit is compiled in the project.
Each command object contains the translation unit’s main file, the working directory where the compiler ran and the actual compile command.
See the online documentation for more information:
compile_commands.json
¶CMake
(since 2.8.5) supports generation of compilation databases
for Unix Makefile builds with the option
CMAKE_EXPORT_COMPILE_COMMANDS
.
Usage:
cmake <options> -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <path-to-source>
For projects on Linux, there is an alternative to intercept compiler
calls with a more generic tool called bear
.
Usage:
bear <compilation_command>
Tip
It is recommended to use bear
. It can be installed with the
packet manager, typically:
sudo apt install bear
compile_commands.json
¶In order to use the produced compilation database, run TrustInSoft Analyzer with the following command:
tis-analyzer -compilation-database path/to/compile_commands.json ...
Also, if a directory is given to the -compilation-database
option,
it will scan and use every compile_commands.json
file located in
the given directory and its sub-directories.
tis-analyzer -compilation-database path/to/project ...
It is also possible to use compilation databases in a tis.config
file for the analysis.
A possible generic template for the tis.config
file is given below
(see Configuration files for more information about
tis.config
files).
{
"compilation_database":
[
"path/to/compile_commands.json"
],
"files":
[
"path/to/file_1",
"path/to/file_2",
"path/to/file_N"
],
"machdep": "gcc_x86_64",
"main": "main",
"val": true,
"slevel-function":
{
"function_name": 10
}
}
To use the tis.config
file, run TrustInSoft Analyzer with the
following command:
tis-analyzer -tis-config-load tis.config
Note
The tis.config
file uses a strict syntax for JSON. A typical
mistake would be to put a comma for the last line of an object,
e.g. for the line "path/to/file_N"
, and it would lead to an
error.
At this point, whatever method was chosen for the preparation step, you should, for instance, be able to execute:
tis-analyzer -metrics <... arguments...>
with the appropriate arguments, the analyzer should run with no errors. Using
the command tis-analyzer-gui
with the same arguments starts the GUI which
lets you browse through the source code, but not see the analysis results yet,
since nothing has been computed at the moment.
It is often useful to save the results of an analysis with:
tis-analyzer ... -save project.state > project.log
This command puts all the messages in the file project.log
and saves the
state of the project itself to the file project.state
, so that it can be
loaded later on. For instance, we can load it now in the GUI by executing:
tis-analyzer-gui -load project.state
In case the application includes some special features (assembler code, etc.) and/or requires to be studied for a specific hardware target and/or with specific compiler options, please refer to Dealing with Special Features.
This chapter explains how to specify which part of the source code of an application will be studied and in which context. Moreover, it also shows how the overall goal can be split into several separate analyses if needed. The main objective is to be able the run the value analysis, implemented by the Value plug-in, in order to obtain the alarms concerning the software weaknesses listed in the CWE-subset.
The study perimeter could be the whole program, or only some functions of a library, or a single use case scenario. Explaining how to decide which part of the source code should be studied is very difficult, since it depends a lot on the particular application, the amount of time available, and mostly on how one looks at the problem… Adopt an incremental approach: begin with a small study, in order to understand how to apply the tools in the given situation, and then enlarge the perimeter later on.
In order to run a value analysis, an entry point to the program has to be
provided. The body of the entry point function defines the studied perimeter. It
is usually the main
function which establishes the context verified by the
analysis, but other functions can be used to this end as well.
main
function of an application can be used
directly. However, if it takes options and arguments, it still has to be
called from an entry point that builds values for them. The tis-mk-main
utility can help in doing so (see tis-mk-main Manual). Be aware
though, that if main
is is a complex function that parses options and
needs many string manipulations, it is probably a better idea to write a
smaller entry point from scratch in order to define a more precise context of
analysis.It is important to mention here the difference between dynamic test execution
and static value analysis. As the code is not executed in the latter, each of
the built inputs provided to the analyzed function does not need to have a
single value. It means that a function taking a single integer parameter x
can, for instance, be analyzed for all the possible input values, or for all the
values from a given set (e.g. 3 < x < 150
). So when we mention “a value”
here, we do not actually mean “a single concrete value”, but rather “a set of
abstract values”.
Basically, the entry point function has to call the functions to analyze, providing them with appropriate input values (i.e. function arguments) that correspond to the studied perimeter. Some builtin-in functions are available to build these input values:
for an integer interval:
x = tis_interval(l, u);
It guarantees that the analyzer will produce warnings for any bad behavior
that could result from any value between l
and u
being returned.
Several other functions are also provided for other types like for instance
tis_double_interval(l, u)
for floating-point values,
and tis_unsigned_long_long_interval(l, u)
for wide integers,
which behave the same way
for the types double
and unsigned long long
.
to initialize addr[0 .. len-1]
:
tis_make_unknown (addr, len);
It guarantees that the analyzer will produce warnings for any bad behavior
that could result from having any arbitrary len
bytes in memory
starting from addr
.
The tis_make_unknown
function is also useful to initialize a simple
variable:
tis_make_unknown (&x, sizeof (x));
This is equivalent to x = tis_interval(l,u);
when
l is the minimum value of the type
and u is the maximum value of a type.
for a non-deterministic choice between two integers:
x = tis_nondet (a, b);
It guarantees that the analyzer will produce warnings
for any bad behavior that could result from x
value being a
or b
.
These are only two cases,
but these cases combine with the other possibilities
resulting from the calls to the other builtin functions.
for a non-deterministic choice between two pointers:
p = tis_nondet_ptr (&x, &y);
This one is similar to the previous one, but for pointers.
Example: the main
function below shows a valid entry point to test a
compute
function that takes a buffer, its length, and a pointer to store
a result:
#include <stdio.h>
#include <tis_builtin.h>
int compute (char * buf, size_t len, char * result);
int main (void) {
char buf[100];
tis_make_unknown (buf, 100);
size_t len = tis_interval (0, 100);
char x;
char * result = tis_nondet_ptr (NULL, &x);
int r = compute (buf, len, result);
}
the builtin tis_init_type
can be used to initialize a simple pointer,
such as int * p
, or a pointer to a recursive data structure, such as
struct list * p
. It takes five arguments:
tis_init_type(str_type, ptr, depth, width, valid)
const char * str_type
should be a string
representing a valid type of the memory to initialize.void * ptr
should be a pointer to the memory
area to initialize.unsigned long depth
should be an integer that
exactly mirrors the behavior of the option -context-depth
during the
initialization.unsigned long width
should be an integer that
exactly mirrors the behavior of the option -context-width
during the
initialization.-context-valid-pointers
during the initialization.Example:
#include<tis_builtin.h>
struct list {
int data;
struct list * next;
};
int main(){
int *p0, *p1, *p2;
struct list * p3;
tis_init_type("int *", &p0, 1, 1, 1);
tis_init_type("int *", &p1, 1, 10, 1);
tis_init_type("int *", &p2, 1, 1, 0);
tis_init_type("struct list *", &p3, 3, 1, 1);
tis_dump_each();
}
The code above calls tis_init_type
to initialize pointers
p0
, p1
, p2
and p3
. More specifically:
tis_init_type("int *", &p0, 1, 1, 1)
allocates an array of size 1
given by the width
argument, initialize the array element to any possible
integer: S_p0[0] ∈ [--..--]
, and then assign the array address to pointer p0
:
p0 ∈ {{ &S_p0[0] }}
.tis_init_type("int *", &p1, 1, 10, 1)
allocates an array of size 10: S_p1[0..9] ∈ [--..--]
and assigns its
address to pointer p1
: p1 ∈ {{ &S_p1[0] }}
.tis_init_type("int *", &p2, 1, 1, 0)
sets the last argument to 0,
which allows p2
possibly be a NULL
pointer: p2 ∈ {{ NULL ; &S_p2[0] }}
.tis_init_type("struct list *", &p3, 3, 1, 1)
allocates a list of
length 4 (the index of the last list element corresponds to the depth
argument),
and assign the list head address to pointer p3
.p0 ∈ {{ &S_p0[0] }}
S_p0[0] ∈ [--..--]
p1 ∈ {{ &S_p1[0] }}
S_p1[0..9] ∈ [--..--]
p2 ∈ {{ NULL ; &S_p2[0] }}
S_p2[0] ∈ [--..--]
p3 ∈ {{ &S_p3[0] }}
S_p3[0].data ∈ [--..--]
[0].next ∈ {{ &S_next_0_S_p3[0] }}
S_next_0_S_p3[0].data ∈ [--..--]
[0].next ∈ {{ &S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
[0].next ∈ {{ &S_next_0_S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
[0].next ∈ {0}
In order to obtain more details about the available functions which allow building imprecise values, refer to the Abstract values section, or browse the file:
more $(tis-analyzer -print-share-path)/tis_builtin.h
Some tools are also available and may help to build the entry point for specific situations (see tis-mk-main Manual).
Now, when the main entry point is ready, it is time to run the value analysis
for the first time using the -val
option.
An important thing to check is the nature of external functions. More precisely, to look for this message in the log file:
$ grep Neither proj.log
[kernel] warning: Neither code nor specification for function ...
This message indicates that the given function is undefined. In order to progress with the value analysis, it MUST be defined by either:
The libc
library functions should not appear in these messages since most of
them are already specified in provided library files (see About Libraries).
Writing C stubs for functions for which no code is available is the recommended way to go. The standard functions and the builtins presented above (see Write an Entry Point) may be used to abstract the implementation details.
To illustrate how to write stubs using standard functions
and analyzer builtins,
say that the code we want to analyse to find errors in it
is the function main
below,
and we do not have the code for the function mystrdup
.
char *mystrdup(char *s);
int main(void) {
char c, *p;
int x;
p = mystrdup("abc");
if (p)
c = p[0];
x = c - '0';
}
There is currently no good way to write a specification that indicates
that mystrdup
allocates a new block
and makes it contain a 0-terminated string, but instead,
the recommended method is to abstract it with a stub that may look as follows:
#include <string.h>
#include <stdlib.h>
#include <tis_builtin.h>
char *mystrdup(char *s) {
size_t l = strlen(s);
char *p = malloc(l+1);
if (p) {
tis_make_unknown(p, l);
p[l] = 0;
}
return p;
}
The files can be analyzed with:
$ tis-analyzer -val -slevel 10 main.c mystrdup.c
As shown in the trace, the analyzer correctly detects
that the main
function may use c
uninitialized:
tests/val_examples/stub_main.c:13:[kernel] warning: accessing uninitialized left-value: assert \initialized(&c);
tests/val_examples/stub_main.c:13:[kernel] warning: completely indeterminate value in c.
When specifying an external function with the ACSL properties, only the
assigns
properties are mandatory: they give to the tool an
over-approximation of what can be modified. However, providing also the
function’s post-conditions can help the analyzer and yield more precise results
(see Write a Specification).
Performing value analysis with no additional options (like in all the cases above) makes it run with a rather low precision. It should not take too long to get the results that indicate where the alarms were found. When using the GUI, the list of alarms can be selected using the Kind filter of the Properties panel, and a summary of the number of alarms can be found in the Dashboard panel.
The global precision can be changed using the -slevel n
option. The
greater n
is, the more precise the analysis is (see About the Value Analysis for
more details). These alarms which could be formally verified by increasing the
precision in this way will disappear. Those which remain are the difficult part:
they require further attention.
Value analysis takes longer and longer when the precision increases. Thus it can be profitable to fine tune the precision locally on certain functions in order to benefit from the higher precision level where it is advantageous (so that more alarms are formally verified) while keeping it lower where it matters less (so that the analysis runs faster).
For the same reason (fast analysis to find bugs earlier) it can also be useful to reduce (temporarily) the size of the arrays (when the source code is structured to allow this easily).
The final analysis information can be found in the Dashboard panel.
Note that at this point the goal is not to study the alarms precisely, but rather to get a rough idea of the amount of work needed in order to be able to decide which part to study.
The experience suggests that if the size of the analyzed source code is large and / or if there are many alarms, it is usually worthwhile to split the study into smaller, more manageable sub-components. The idea here is to write a precise specification for every sub-component and then analyze each of them independently toward its particular specification. Afterwards, the main component can be studied using those specifications instead of using directly the sub-components’ source code.
It is quite easy to decide which part should be split if some main features are identifiable and clearly match a given function. Otherwise, a first overview of the number of alarms may help to isolate a part that seems difficult for the analyzer. However, as the separated function must be specified, it is much easier if it has a small and clear interface (in order study a function, it must be called in the intended context, and this context might be difficult to build if it corresponds to a large and complex data structure).
To split the analysis one must write:
main
function for the main component,main
function and an ACSL specification for each of the API functions
which is supposed to be studied independently.Then, when performing the analysis for the main component, the -val-use-spec
option should be used in order to provide the list of the API specified
functions. For each of the functions from this list the value analysis will use
the function’s ACSL specifications instead of the function’s body.
For instance, the commands below can be used to split the study into the main
analysis with two sub-components corresponding to the f1
and f2
functions:
$ tis-analyzer -val $SRC main.c -val-use-spec f1,f2 \
-acsl-import f1.acsl,f2.acsl \
-save project.state
$ tis-analyzer -val $SRC main_f1.c -acsl-import f1.acsl -save project_f1.state
$ tis-analyzer -val $SRC main_f2.c -acsl-import f2.acsl -save project_f2.state
In the commands above:
main.c
, main_f1.c
and main_f2.c
should hold the entry
points for the main component, and the f1
and f2
functions
respectively;f1.acsl
and f2.acsl
should hold the ACSL specifications of
the, respectively, f1
and f2
functions (see Write a Specification
to learn how to write a specification).There is another case where studying an entry point may require several separate analyses: when there is a parameter in the program that has to be attributed a value (e.g. using a macro) and when it is difficult to give it an arbitrary value beforehand (e.g. it is a parameter defining the size of an array). In such situations it is better to write a loop in an external script to attribute different values to that parameter and run as many analyses as necessary.
The following script runs an analysis for every N
being a multiple of 8
from 16
to 128
:
#/bin/bash
for N in $(seq 16 8 128) ; do
tis-analyzer -D N=$N -val $SRC main.c
done
Of course, it supposes that N
is used somewhere in main.c
.
Writing a specification for a function is useful in two cases:
When some splitting is done.
A certain function can be studied independently, as a separate sub-component of the analysis. The function is verified toward a certain specification and then that specification can be used (instead of using directly the function’s body when analyzing function calls) in the main component’s verification (To Split or Not to Split).
When using some unspecified external functions.
If an external function, that is not part of the subset of the libc
library functions provided with the tool (which are already specified), is
used in the program, then it needs an explicit specification. The provided
specification has to indicate at least which of the concerned data may be
possibly modified by the function. Pre and postconditions are not mandatory
in that case.
The specification is written in ACSL and is mainly composed of:
requires
properties),assigns
properties),\from
part of assigns
properties),ensures
properties).The ACSL properties can be either inlined directly in the source files or
written in separate files and loaded (as explained in ACSL Properties).
An analysis will use the specification instead of the function’s body to process
a function call when either the body is not provided or an appropriate
-val-use-spec
option has been set in the command line.
When analyzing a function call using the specification, the tool:
In the specification of an external function the preconditions are not mandatory. If some are provided though, the analyzer checks whether they are satisfied at each call. Therefore adding preconditions in that case makes the verification stronger.
When a defined function is analyzed separately from the main application, its preconditions define the context in which it is studied. This does not have to be the most general context imaginable, but it has to include at least all the possible usage contexts that can be found in the application.
For example, suppose that the function f
has been selected to be studied
independently, and that it takes a single parameter x
. If f
is always
called from the application with positive values of x
, it is possible to
study it only in that context. This property must be then specified explicitly
by the precondition:
requires r_x_pos: x > 0;
Also, the corresponding main_f
function - i.e. the one that is written and
used as an entry point in order to study f
individually - must call f
with all the positive values for x
. If the preconditions specify a
context smaller than the context defined implicitly by the main
function, it
will be detected by the analysis since some of the preconditions will be then
invalid. But the opposite case (i.e. if the specified context is larger than the
studied input context) would not be detected automatically.
In other words, in the example above, if main_f
calls f
with (x >=
0)
, it will be detected since (x == 0)
does not satisfy the precondition.
However, if it calls f
only with (0 < x < 10)
, the precondition will be
formally satisfied, but the function behavior for (x >= 10)
will not be
studied. If, for instance, f
is then called in the application with
(x == 20)
, the problem will not be detected since the precondition is valid
for this value.
Warning
When verifying a function toward a specification that is then used to verify another component, it is very important to make sure that the context defined by the specified preconditions and the studied input context represent the same thing.
Note that:
\initialized
precondition should be included in the specification;\valid
(meaning that they point to an allocated memory zone).The assigns
properties are composed of two parts, which specify the modified
data and its dependencies:
//@ assigns <left part> \from <right part>;
Each assigns
property specifies the modified data on the left side
of the \from
keyword.
The union of the left parts of all the assigns
properties in a given
function’s specification is an over-approximation of the data modified by this
function. Hence the data that is not in this set (i.e. the set defined by
the union of their left parts) is expected to have the same value in the
pre-state and the post-state.
The information about the modified data is used:
Each assigns
property specifies the data dependencies on the right
side of the \from
keyword.
The output value of the modified data is expected to depend only on the value of its data dependencies. In other words, if the value of the dependencies is equal in two input states, then the value of the modified data should be equal in the two output states.
There are two kinds of dependencies:
The indirect dependencies have to be explicitly marked with an indirect:
label. All the other dependencies are considered as direct.
Here are some examples of correctly defined dependencies:
//@ assigns \result \from a, b, indirect:c;
int f (int a, int b, int c) { return c ? a : b; }
int t[10];
//@ requires 0 <= i < 10; assigns t[..] \from t[..], a, indirect:i;
void set_array_element (int i, int a) { t[i] = a; }
The dependency information is:
Not used by the WP plug-in.
Very important for many analysis techniques that require knowledge about the data dependencies (such as Show Defs feature in GUI, slicing, etc.), but only when the function body is not used, since if the body is available the dependencies can be computed by the From plug-in.
Employed in the value analysis of the pointers: the output value of the modified pointers can only be among the specified direct dependencies.
Note that an intermediate pointer is needed when a pointer is assigned to the address of a variable. This property is not valid:
assigns p \from &x; // NOT valid.
One must rather declare T * const px = &x;
in the code and then write
the correct property:
assigns p \from px;
It means exactly that the output value of p
may be based on &x
and
on no other existing variables.
Remember that the assigns
properties specify an over-approximation of
the modified data. For instance, the following properties only say that nothing
except B
is modified by the function:
assigns B \from \nothing;
assigns \result \from B;
In order to specify that B
is surely initialized after the function call,
one has to add a post-condition:
ensures \initialized(&B);
When the function result is used to return an error status, it is often the case that the post-condition rather looks like:
ensures (\result == 0 && \initialized(&B)) || (\result < 0);
It is not mandatory to specify ensures
properties, neither in the case of
splitting a defined function nor for specifying an external function. However,
some information about the values in the returned state might be needed in the
analysis of the caller function.
In the specification of an external function, the provided post-conditions cannot be checked since the source code is not available. Hence they are used as hypotheses by the analysis and cannot be formally verified themselves.
Warning
As the post-conditions of external functions cannot be verified by the tool, they must be checked with extra care!
If ensures
properties are specified, it is usually good to keep them as
simple as possible. They have to be verified during the function’s body analysis
and over-specification only increases the amount of work necessary to achieve
that.
Before going any further, it is often advantageous to check the code coverage in order to verify if all the dead code which exists in the application (i.e. the parts of code composed of the functions and branches that are not reachable by the analysis) is indeed intended. To learn how to do that, see Information about the Coverage.
Dead code can be spotted in the GUI by looking for statements with the red background. If some dead code seems strange, it can be explored and investigated using the value analysis results. Clicking on variables and expressions allows to inspect their computed values.
As long as the analysis did not degenerate, the code classified as dead by the tool is a conservative approximation of the actual dead code. It is guaranteed that, in the context defined by the input values, the concerned statements cannot be reached whatever happens. When relying on this guarantee, one should however keep in mind these two important assumptions it depends on: that it applies only if the analysis did not degenerate and that dead code is always considered in the context defined by the input values. This is because most of the time if some code has been marked as dead when it should not have been, the reason is actually that the context of analysis was defined too restrictively (i.e. it does not include all the input values that can happen in the real execution). Another common reason is that the analysis has simply stopped computing a given branch:
At this point, one should have some analyses (one or several) which cover the intended parts of the code and end without any degeneration. The results most likely include alarms. The chapter Study the Alarms explains how to deal with the alarms and, before that, the chapter Get the Information explains how to extract more information from the analysis results.
In case if, due to applying some splitting, there are several analyses, there is no preferred order of the verifications. In any case however, modifying the existing specifications leads to invalidating the results obtained so far.
Caution
The tis-info
plug-in is not available in this version.
This chapter explains how to extract some information from the analyzed project
using the tis-info
plug-in and other external scripts. The tis-info
plug-in provides options to generate textual files containing information about
functions, variables, properties and statements. Filters can be used to extract
specific information from the these files.
Some pieces of information are purely syntactic while some others are of semantic nature. The semantic information is only available if the project which was used to generate the files holds the value analysis results.
For an exact and up-to-date description of each generated piece of information, please refer to the tis-info Manual.
The tis-info
plug-in can be used to generate CSV files. The main options
allow us to extract the information concerning:
-info-csv-functions functions.csv
-info-csv-variables variables.csv
-info-csv-properties properties.csv
-info-csv-statements statements.csv
For instance, in order to get the information about functions from a previously
saved project project.state
, the command line would be:
tis-analyzer -load project.state -info-csv-functions functions.csv
As mentioned before, the kind of obtained information (i.e. either purely syntactic or also semantic) will depend on whether the saved project includes the value analysis results or not.
In the generated CSV files, the information about each element is printed on a
single line (with comma separated fields). Hence, the files can be opened in a
spreadsheet tool for easy selection of elements. Moreover, this format can be
easily grepped (i.e. filtered using the grep
utility), for instance, the
following command returns all the information about the function funname
:
grep funname functions.csv
In order to filter on a specified column, the awk
tool is also very
practical. For instance, the following command returns only the lines where the
word valid
appears in the fifth column:
awk -F, '! ($5~"valid") {print}' properties.cvs
Also, awk
can be used to easily extract only some of the columns:
awk -F, '{print $4 $5}' properties.cvs
The generated file functions.csv
provides information about the functions.
It contains the list of both defined and declared functions appearing in the
analyzed source code, including their locations, whether they are called or not,
are they reachable in the analyzed context, etc. The most useful piece of
information here concerns the coverage and it is detailed just below.
The coverage of each function can be found in the appropriate column of the
functions.csv
file. Note, that this information is semantic of nature and
thus only available if the value analysis results have been computed.
At this point, the internal functions are usually not interesting and they can be filtered out with:
grep -v TIS_KERNEL_SHARE
The easiest approach then might be to check first the completely unreachable functions:
grep ", unreachable,"
And the completely covered ones:
grep -v ", unreachable," | grep -v "100.0%"
Then the GUI can be used to explore the dead code of the functions that are not totally covered in order to verify if this is intended or not.
If the information about the code coverage comes from several separate analyses,
the generated functions.csv
file is not sufficient anymore to measure the
real coverage of functions, since it represents only the results extracted from
only one project out of many. Because of this issue, the tis-aggregate
tool provides a coverage
command to extract all the relevant information
from the functions.csv
files and compile it into overall coverage results
that can be presented in the CSV format:
tis-aggregate coverage project.aggreg > coverage.csv
Here, project.aggreg is a file that gives the base name of the analyses to consider. For instance:
path_1/proj_1
path_2/proj_2
...
path_n/proj_n
The tool then process information from the path_i/proj_i_functions.csv
files.
This tool also provides some more options, such as presenting the results in HTML format (see the Tis-aggregate Manual).
An interactive HTML report can also be generated with tis-report
The location and status of each property can be found in the properties.csv
file. If the names given to the user annotations follow some naming conventions
(see Naming the Annotations), it is quite easy to use grep
to extract
more precise information that file.
For instance, if the names of the annotations that should be proved by the WP
plug-in all have a _wp
suffix, it is easy to check if they are all verified
with the following command:
grep "_wp," properties.csv | grep -v ", valid$"
The generated file statements.csv
provides information about certain kinds
of statements in the analyzed program.
For instance, it contains information about the function calls, in particular whether a specific call is direct or not. Moreover, if an indirect call has been encountered during the value analysis, it provides the list of all the possibly called functions. Extracting this information can be done with:
grep ", call," statements.csv | grep -v DIRECT
Some useful information concerning the condition values can also be found here.
Especially, whether a condition happens to be always true or always false. This
kind of situations is often also observable through the dead code, although not
in all cases, since an if
condition might be always true, but may have no
else
branch (which, obviously, would be dead if it existed).
The information about all the variables is available in the variables.csv
generated file. The exception are the global variables which are not accessed or
modified, since they are removed from the analysis results. This information can
also be used, for instance, to easily find the location of the definition of a
variable or to list all the static or volatile variables.
The list of all the existing alarms is given in Value Analysis Alarms.
Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.
False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.
The list of the generated assertions can easily be extracted from the
properties.csv
file (see Information about the Properties). Then, for instance,
these assertions can be counted in order to track the evolution of their total
number during the working process. (Note, however, that this particular measure
is not necessarily very pertinent, because the relation between problems and
emitted alarms is not really one-to-one. Losing precision at one point of the
analysis can lead to several alarms which have the same origin. Moreover,
solving one problem may cause many unrelated new alarms, as several problems
might have been hiding behind the solved one.)
The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.
Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.
Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.
See Tune the Precision for explanations concerning tuning the precision level
with the -slevel
option. The precision level is related with the number of
states that can be stored for each program point. The smaller this number is,
the more coarse the approximation is, as more computed states are merged
together.
Example:
//@ assert 0 < x < 10;
if (x < 5)
y = 5;
else
y = 10;
L:
Computing a unique state at label L
only gives that x ∈ [1..9]
and y ∈
[5..10]
. But if the slevel
is larger, then two states can be stored at
L
giving exactly that either y == 5
when x ∈ [1..4]
or y == 10
when
x ∈ [5..9]
.
Notice that the assertion assert 0 < x < 10
above reduces the possible
values for x
. It is important to remember that this works in the same way
for the assertions automatically generated from alarms. For instance, if a
statement a = b / c;
is reached with c ∈ [0..100]
, an alarm is emitted
in form of an appropriate assertion:
/*@ assert Value: division_by_zero: c ≢ 0; */
Then, the analysis continues with context where c ∈ [1..100]
.
Beside conditions in the code that automatically split the states as above
(when there is enough slevel
),
some builtins are available to generate more than one state.
The builtin tis_variable_split (a, sz, n);
splits the state
on the data which address is a
and the size sz
if it holds less than n
values.
For instance, if f
returns x ∈ [1..5]
in the following code,
the call to tis_variable_split
generates five states,
one for each value of x
:
int x = f();
tis_variable_split (&x, sizeof(x), 10);
Moreover, the builtin tis_interval_split(l, u)
does the same thing that
tis_interval(l, u)
does, but it automatically causes the
individual values between l
and u
inclusive to be propagated
separately. The slevel
option must then be set high enough to keep
the precision of the analysis.
In the following example, since all values of n
are propagated
separately, the analyzer is able to guarantee that the x_pos
assertion
holds.
#include <tis_builtin.h>
int G;
void foo(int n)
{
int x = G;
/*@ assert x_pos: 1 <= x <= 10; */
}
int main(void)
{
int n = tis_interval_split(-10, 10);
G = n;
if (n > 0)
foo(n + 5);
return 0;
}
Tuning the -slevel
and -slevel-function
options to control the precision
of the analysis has already been explained in Tune the Precision.
In the GUI, statements where the precision level (i.e. the slevel
value) has
been reached are indicated by a colored margin in source view windows (see
Interactive Code Panel). Moreover, the reached value can be sorted in the
statements window (Statements).
The analysis can also get imprecise for other reasons than reaching the defined
precision level. Especially this is the case when the log trace include messages
about garbled mix
values. It is very likely that if such messages appear,
the analysis will not produce any interesting results.
Tip
The analysis can be stopped automatically from the command line
option -val-stop-at-nth-garbled
.
This option can also be set from the GUI.
The analysis can also be stopped when it reaches the maximum memory consumption
set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM
. If
TIS_KERNEL_MAX_MEM
is set, the analyzer becomes more conservative
in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2
GiB of
memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM
GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value
for this variable is 62
.
There are several other options that can be used to fine tune the value analysis.
Some of them work similarly to the -slevel
option, i.e. they control the
maximum number of states that can be kept separated. For instance:
-val-split-return-function f:n
: split the return states of function f
according to \result == n
and \result != n
;
-val-split-return auto
: automatically split the states at the end of each
function according to the function return code;
-val-split-return-function f:full
: keeps all the computed states at the
end of function f
in the callers;
-val-slevel-merge-after-loop <f | @all>
when set, the different
execution paths that originate from the body of a loop are merged
before entering the next execution.
The default behavior is set to all functions
(-val-slevel-merge-after-loop=@all
). It can be removed for some
functions (-val-slevel-merge-after-loop=@all,-f
), and
deactivated for all functions
(-val-slevel-merge-after-loop=-@all
), and activated for some
(-val-slevel-merge-after-loop=-@all,+f
).
-wlevel n
: do n
loop iterations before widening.
Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:
-val-ilevel n
: Sets the precision level for integer representation to
n
: each integer value is represented by a set of enumerated values up to
n
elements and above this number intervals (with congruence information)
are used.-plevel n
: Sets the precision level for array accesses to n
: array
accesses are precise as long as the interval for the index contains less than
n
values. See Tuning the precision for array accesses for more information about this
option.There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).
Of course not all existing options have been enumerated here.
The full list of the available options is given by -value-help
.
As explained in Tune the Precision, temporarily reducing the size of the
arrays may be a first step during the interactive phase
to make the analysis time shorter.
But when analyzing large arrays, the -plevel
option can be used
to increase the precision level for array accesses.
This option sets how hard the analyzer tries to be precise for memory accesses
that, considering the imprecision on the indexes involved, can be at any of many
offsets of a memory block. The default is 200 and may not be sufficient for an
access at unknown indices inside a large or nested array to produce a precise
result. This is illustrated by the example below:
#include <tis_builtin.h>
char c[20];
struct s { int m[50]; void *p; };
struct s t[60];
void init(void) {
for (int i = 0; i < 60; i++)
{
for (int j = 0; j < 50; j++)
t[i].m[j] = (i + j) % 10;
t[i].p = c + i % 20;
}
}
int main(void) {
init();
int x = tis_interval(5, 45);
int y = tis_interval(2, 56);
t[y].m[x] = -1;
x = tis_interval(5, 45);
y = tis_interval(2, 56);
int result = t[y].m[x];
int *p = &t[y].m[x];
int result2 = *p;
}
With the default value of -plevel
(200):
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
x ∈ [5..45]
y ∈ [2..56]
result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
With higher plevel:
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
x ∈ [5..45]
y ∈ [2..56]
result ∈ [-1..9]
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
Note that result2
is not precise even with the higher plevel.
Handling the lvalue t[y].m[x]
directly allows the analyzer to be optimal
as long as the value of the plevel option allows it to,
but forcing the analyzer to represent the address
as the value of the variable p produces this set of offsets:
{{ &t + [428..11604],0%4 }}
This set of offsets contains pretty much the addresses of everything in t
,
including the p
pointer members,
so it appears when dereferencing p
that the result can be an address.
As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.
The environment variable TIS_KERNEL_MEMORY_FOOTPRINT
can be used to set
the size of caches used during the value analysis,
speeding up some slow analyses in which the caches were getting thrashed.
The default is 2 and each incrementation doubles the size of caches.
Only use this variable if you are not already low on memory.
Another useful option which helps in reducing the computation time is
-memexec-all
. If this option is set, when analyzing a function, the tool
tries to reuse results from the analysis of previous calls when possible.
It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.
First of all, the standard printf
function can be used to output constant
messages or messages involving only precise values. However, printing the
computed values when they are imprecise is not possible using printf
, and
instead the tis_show_each
function should be used. The name of this
function can be extended with any string, so that is is easier to make the
difference between different calls (as the full function name is printed each
time it is called). For instance:
tis_show_each_iteration (i, t[i]);
The statement above will output messages like this (one for each analyzed call):
[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])
Another useful function, which properly handles imprecise values, is
tis_print_subexps
. When given any number of expressions as arguments, it
will print the values of all the sub-expressions of each provided expression.
The first argument of this function must be always a string literal, which will
be printed out in order to help distinguish between different calls. For
instance:
tis_print_subexps ("simple sums", x + y, y + z, z + x);
Such a statement will output messages like this:
[value] Values of all the sub-expressions of simple sums (expr 1):
int x + y ∈ {3}
int x ∈ {1}
int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
int y + z ∈ {5}
int y ∈ {2}
int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
int z + x ∈ {4}
int x ∈ {1}
int z ∈ {3}
Moreover, tis_print_abstract_each
allows the contents of structured variables
to be observed. For instance:
tis_print_abstract_each(&i, &t);
This statement will output messages like this (one for each analyzed call):
[value] Called tis_print_abstract_each:
i ∈ {0; 1; 2; 3; 4}
t[0..4] ∈ [0..10]
[5..99] ∈ UNINITIALIZED
Note that, tis_print_abstract_each
in general takes addresses of variables
as parameters. This applies as well for an array, such as t
in the example
above. Contrary to popular belief, when t
is an array, &t
is not the
same as t
, and the user should call tis_print_abstract_each(&t);
to
see the whole array (pointer decay only shows the first element).
To get even more information, the tis_dump_each
function can be used
to print the whole state at the program point where it is called.
But it may be easier to call the tis_dump_each_file
function
to print the state in a file.
The name of the file is computed from the first argument of the call
(which must be a string literal), an incremented number,
and an optional directory given by the -val-dump-directory
option.
The -val-dump-destination
option allows to choose
which kind of output is expected among txt
or json
(all
for both, none
for no output).
For instance, when calling tis_dump_each_file ("state", *(p+i))
in a test.c
file, and analyzing it with the command:
$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp
these messages are shown in the trace:
test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'
The two generated files hold both the whole state computed the first time the
program point is reached and the possible values for *(p+i)
.
For instance, the JSON file may look like:
{
"file": "test.c",
"line": 11,
"args": "([0..10])",
"state": [
{
"base": "t",
"values": [
{ "offset": "[0..4]", "value": "[0..10]" },
{ "offset": "[5..9]", "value": "UNINITIALIZED" }
]
},
{ "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
]
}
To better understand the results, see the Value Analysis Data Representation section.
In order to avoid wasting time analyzing the application in a wrong context, the
analysis can be stopped as soon as some alarms are generated thanks to the
-val-stop-at-nth-alarm
option. With the argument equal to 1
, it aborts
the analysis at the first alarm. To ignore a certain number of alarms, the
argument can be increased. Although there is no strict relation between the
given argument and the number of alarms generated before the analysis stops
(i.e. these two values are not necessarily equal), one thing is guaranteed:
providing a larger number will lead to skipping more alarms.
The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:
$ kill -USR1 12345
The analyzer can also be stopped through the GUI (see Disconnect/Kill server).
When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.
The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:
$ tis-analyzer --timeout 5m -val ... -save project.state
Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:
tis_watch_cardinal
: stop the analysis when the number of different values
that a given memory location may possibly contain (because of imprecision) is
greater than a certain maximal amount.tis_watch_value
: stop the analysis when a given memory location may
possibly contain a value of the provided set of forbidden values.tis_watch_address
: stop the analysis when a given memory location may
possibly contain an address.tis_watch_garbled
: stop the analysis when a given memory location may
possibly contain a garbled mix value.tis_detect_imprecise_pointer
: stop the analysis when any expression is
evaluated to an imprecise pointer which contains a given base address.These functions are available using #include <tis_builtin.h>
.
The arguments of the four tis_watch_*
functions follow the same logic:
p
and its size s
.n
is the number of statements during which the condition
may remain true before the analysis is stopped:n == -1
, the analysis never stops, but messages are printed
each time the condition is reached;n == 0
, the analysis stops as soon as the condition is reached;n > 0
, the analysis continues for the n-th first occurrences
where the condition is reached (and prints messages for each of them)
and stops at the next occurrence.The function tis_detect_imprecise_pointer
only takes a pointer as argument.
Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:
int x = 0; /* the memory location to watch */
void *p = (void *)&x; /* its address */
size_t s = sizeof(x); /* its size */
int y[10];
/* The analysis stops when x is not exact (i.e. not a singleton value). */
int maximal_cardinal_allowed = 1;
tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);
/* The analysis stops the fourth time when x may be negative. */
int forbidden_values = tis_interval(INT_MIN, -1);
int exceptions = 3;
tis_watch_value(p, s, forbidden_values, exceptions);
/* The analysis stops when x may be an address. */
tis_watch_address(p, s, 0);
/* The analysis stops when x may be a garbled mix value. */
tis_watch_garbled(p, s, 0);
p = y;
/* The analysis starts to detect if an expression is evaluated to an
imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);
/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;
Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.
If you do not know how to add ACSL properties to your project, first read ACSL Properties.
Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.
Warning
Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.
Some examples are given below.
If source code modifications are needed, and the source code is in a git
repository, the tis-modifications
tool may be helpful to track them
in order to check that they are correctly guarded. See
tis-modifications Manual.
As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.
Example:
int T[10];
...
L1: if (0 <= x && x < y) {
...
L2: if (y == 10) {
L3: ... T[x] ...
}
...
}
When the T[x]
expression is encountered at the label L3, the analyzer tries
to check if the T
array is accessed correctly (i.e. inside the array
bounds), namely, is the (0 <= x < 10)
condition true. It already knows that
the (0 <= x)
part holds, due to the conditional statement at label L1
(assuming that x
has not been modified since then). Whatever values x
might have had before, at L1 they have been restrained to only non-negative ones
and this fact has been stored in the internal state (the interval of values for
x
was modified) and thus is visible at L3. For example, if before L1 the
value of x
was [--..--]
(i.e. nothing is known about x
except that
it is initialized), then after L1 it would be [0..--]
(i.e. the interval
spanning from zero to positive infinity).
Now, the analyzer still needs to verify if (x < 10)
also holds. This is
obvious for anybody who reads the code: the condition at L1 assures that (x <
y)
and the condition at L2 assures that (y == 10)
, therefore (x < 10)
must be true. Unfortunately, because of the limitations of the adopted value
representation method, the analyzer cannot deduce this by itself. The fact that
(x < y)
holds just cannot be expressed in the internal state (nor actually
any abstract relation between variables). And, supposing that the value of y
at L1 is [--..--]
, the (x < y)
condition does not help to restrain the
values of x
, its upper bound remains thus as before. Hence, this important
piece of information is lost and the analyzer simply cannot connect it with (y ==
10)
at L2 in order to correctly restrain the value of x
to [0..9]
. So
at L3 it will consider that the value of x
is [0..--]
and it will emit
an alarm about a potential out of bound access to the array T
.
To help the analyzer, the appropriate assertion can be added explicitly:
at L3: assert ax: x < 10;
Then the alarm will disappear. Of course the ax
assertion still needs to be
verified by other means. For example, this particular assertion can be easily
proven using WP (see Prove Annotations with WP).
State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:
at L1: assert ay: y <= 10 || 10 < y;
As explained before, the analyzer can store multiple memory states at each
program point (as explained in About the Value Analysis) and the maximal number of
states that can be stored per program point in the internal representation is
related to the precision level (i.e. slevel
). So, provided that the
-slevel
option has set the precision level high enough in order to permit
splitting the state here, the assertion above will lead to a case analysis:
y <= 10
case: As y <= 10
is assumed, together with the x < y
condition at L1, it leads to deducing x < 10
, and this time this can be
represented in the internal state.10 < y
case: If 10 < y
is assumed, then the condition at L2 is
false, and therefore the execution branch where the array T
is accessed
at L3 is not reached at all.Thanks to introducing this assertion, the alarm will thus disappear. Moreover,
the analyzer is able to check on its own that ay
is always true, so there is
nothing more to verify.
It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.
Tip
Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).
In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.
For example, if the program contains a test (0 <= i+j < 10)
and then several
uses of T[i+j]
follow, it may be convenient to add a temporary variable
representing the i+j
sum:
int tmp = i + j;
if (tmp < 10) {
..
//@ assert a_tmp: tmp == i + j;
... T[tmp] ...
}
If neither i
nor j
are modified in the meantime, the assertion that
validates the code substitution should be trivial to verify, and the value
analysis is now able to know that (tmp < 10)
.
Beside assert
, requires
and ensures
, the loop invariant
properties are also useful to enhance the analysis precision.
For instance, this function generates an alarm:
int T[100];
//@ requires 0 <= n < 50;
void main(int n)
{
int i = 0;
while (i <= n) {
T[i] = 3;
i++;
}
T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;
This is because the value of i
is too imprecise when leaving the
loop, so the analysis doesn’t know if the access to T[i]
in the last
assignment is valid or not.
Adding this loop invariant remove the alarm:
/*@ loop invariant i <= 50; */
Moreover, the value analysis is able to check that this property is always valid.
A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.
Buffer problem: there is a relation between cur and len, but it cannot be represented.
A typical function to handle this buffer is:
void process (char * cur, size_t len) {
char * p = cur;
for (size_t i = 0 ; i < len ; i++, p++) {
*p = ...
}
}
The validity of the pointer p
has to be checked to avoid an alarm on
p
access, but also to get precise results later on.
It is especially important when the pointer points to an array that
is part of a larger structure. For instance:
struct data {
char buffer[BUFFER_LEN];
unsigned current_index;
unsigned state;
};
//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
char * current = data->buffer + data->current_index;
size_t length = BUFFER_LEN - data->current_index;
if (n > length) n = length;
process (current, n);
...
}
If the analysis is not able to know that p
does not go beyond the end of the
buffer
field, the value of the other fields current_index
and state
might be modified as well and might be too imprecise for the analysis
to give interesting results later on.
So the process
function needs a precondition
to give the constraint on cur
and len
to ensure the validity of the pointer.
This precondition could simply be:
//@ requires \valid (cur + 0 .. len-1);
Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);
Notice that the ACSL functions \base_addr
, \block_length
and
\offset
only provide the expected information when cur
is a pointer
to an array allocated on its own. If the array is a field in a structure,
\base_addr(cur)
returns the base address of the structure.
Structure with a .buf
field:
ACSL functions are related to the allocated block, not the internal array.
Anyway in some cases, even if the analyzer computes the optimal information,
cur
and len
both have an unknown value from intervals
and the relation between the two variables has been lost.
So the memory access to (*p)
raises an alarm when we cannot check that adding the upper bound of both
intervals is smaller than (buf + BUFFER_LEN)
.
Moreover, if buf
is in a structure as explained above,
buf
and BUFFER_LEN
may be unknown in the function.
A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:
/*@ requires f_r_buf: val: cur <= bound;
requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
char * p = cur;
//@ loop invariant f_l_p: val: p <= bound;
for (size_t i = 0 ; i < len ; i++, p++) {
if (p >= bound) break;
*p = ...
}
}
In the previous example, the call to process
would have to be changed to:
process_bounded (current, n, data->buffer + BUFFER_LEN);
As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.
Two annotated functions with such bounds, tis_memset_bounded
and
tis_memcpy_bounded
, are provided to be used instead of memset
and
memcpy
when this problem occurs with these libc functions.
This chapter explains how to introduce ACSL properties to a project.
ACSL is the specification language employed in TrustInSoft Analyzer. ACSL properties can be used to specify functions (as seen in Write a Specification) and to guide the analysis by adding local annotations, such as assertions or loop invariants, which may help in removing false alarms (as seen in Remove Alarms by Adding Annotations).
There are two ways to insert ACSL properties in a project:
One way to add ACSL annotations to a project is to write them directly in the source code in special comments:
/*@ ... */
,//@ ...
.There are several kinds of properties and they all need to be placed in an appropriate place in the source code:
For more information about the ACSL language, please refer to the ACSL Documentation.
Caution
The ACSLimporter
plug-in is not available in this version.
For many reasons it is usually preferable to avoid modifying the source code which is analyzed, as introducing changes to the application’s code can lead to difficulties in comparing it with the original version. For example, adding new properties alters the line numbering in a file, which makes it impossible to report problems with the original source line number.
The ACSLimporter
plug-in makes it possible to write the ACSL properties into
separate files and then import them for the analysis. The syntax of such files
looks like this:
function <function-name>:
contract:
requires <pre-name-1>: <pre-definition-1>;
assigns <assigns-definition-1>;
ensures <post-name-1>: <post-definition-1>;
at L1: assert <assert-name-1a>: <assert-definition-1a>;
at L1: assert <assert-name-1b>: <assert-definition-1b>;
at L2: assert <assert-name-2>: <assert-definition-2>;
at loop 1:
loop invariant <inv-name-1a>: <inv-definition-1a>;
loop invariant <inv-name-1b>: <inv-definition-1b>;
Of course, the <...>
parts should be substituted by specific names and
definitions.
Depending on the organization of the project, it might be better to put all the properties in a single ACSL file or to split them throughout several files. If the properties concerning the same function appear in different files, the specifications are merged.
To load the property files, so that they are taken into account during the
analysis, the -acsl-import <file.acsl>
option has to be specified for each
concerned ACSL file.
Giving a unique name to each annotation permits referring to it easily later on. Moreover, it makes the result files a lot clearer and more readable: when mentioning a particular annotation they will use its name instead of the corresponding file name and line number.
Using standard naming conventions is highly recommended. Some tools require particular naming of assertions to properly check that everything have been verified at the end of the analysis.
The proposed naming conventions are:
add_
).requires
property for function add
could be then named add_r1
).
(However, this not really necessary if the name is always used together with
the corresponding keyword, like for example: requires add_r1
, ensures
add_e2
, etc.)add_e2_val
: if the property is found always valid by Value
;add_e2_wp
: if the property is proved by WP
;add_e2_sc
: if the property could have been removed as redundant by
Scope
(note: it could be necessary to keep this property anyway
because it still might be useful for Value
or WP
computations);add_e2_rv
: if the property has been manually reviewed.These naming conventions might seem to be quite cumbersome to use (especially the verification method suffix). However, as mentioned before, they make the automatic generation/verification possible, so they are highly recommended.
Caution
The WP
plug-in is not available in this version.
WP
refers both to a method to formally verify properties of the analyzed
code and the name of the analyzer’s plug-in that implements this method. WP is a
static analysis technique, like the value analysis, but involving theorem
proving. For a short introduction describing how it works see
Short Introduction to WP Computation.
The purpose of this chapter is mainly to explain in which cases WP can be used with a minimal amount of manual work required. This does not mean that it cannot be used in more complex situations, but then it requires more knowledge about the WP computation and/or competences in performing manual proofs using proof assistants such as Coq.
The easiest way to run WP is to do it in the GUI by selecting the annotation to prove, right-clicking to open the pop-up menu, and choosing the Prove Property by WP option.
However, when the analysis evolves, it is usually more practical to run WP from the command line, save the project, and extract the status of the properties from the information file to just check that the results are still the same.
The command line to run WP and save the project looks like:
tis-analyzer -load project.state -wp-fct f1 -wp-prop f1_p1_wp, f1_p2_wp \
-then -wp-fct g,h -wp-prop f_pre_wp \
...
-then -save project.wp.state
This command line:
opens a previously saved project project.state
: it doesn’t need
to include value analysis results, and even doesn’t have to include
an entry point. All it needs is the body of the functions where the
properties to verify are, and probably some specifications for the
functions called from these functions.
tries to verify the properties named f1_p1_wp
and f1_p2_wp
in the f1
function,
then tries to verify the property f_pre
in g
and h
.
Notice that f_pre
is supposed to be a precondition of f
, and that it
is checked in g
and h
which are supposed to be some of f
callers;
saves the results in the project.wp.state
project.
Notice that a _wp
suffix is used in the names of the properties that
are checked with WP. See Naming the Annotations to understand why naming
conventions are useful.
This is an example of how to use WP, but the plug-in provides many other
options if needed. Please use the -wp-help
option to list them, and
refer to the documentation for more details.
To know how to extract the status of the properties from the
project.wp.state
project, see Information about the Properties.
Let us give very simple explanations about WP for the one that knows nothing about it, because it might be necessary to understand how it works in order to use it when suitable.
To verify that a property is true at a program point, the WP principle
is to propagate it backward and to compute a formula that is such that,
if it can be proved to be true, then it ensures that the initial
property is true as well. The computed formula is then sent to some
automatic prover(s). For instance, tis-analyzer
comes with
alt-ergo
, but more provers can be added.
An example is easier to understand:
const int X = 10;
void f1(int x)
{
int y = x + 1;
int z = 2 * y;
L: //@ assert y_val: y > X;
...
}
To ensure that y_val
is true at L, WP computes that one have to
prove that (x+1 > X)
when entering the function. Notice that the
z
assignment has no effect since WP knows that it doesn’t modify
y
value. This can be automatically proved if a precondition gives:
//@ requires r_x_ge_X: x >= X;
This is because the final computed formula is:
x >= X ==> x+1 > X;
which is easily proved by any automatic prover.
It doesn’t work with the precondition:
//@ requires r_x_ge_15: x >= 15;
This is because WP only works on the function source code, which means that it has no information about the value of X. To solve this kind of problem, one can add:
//@ requires r_X_val: X == 10;
This precondition is easily validated by the value analysis and
can be used by WP to finished the proof with r_x_ge_15
.
In this simple case, the initial property and the computed formula are equivalent, but it is not always the case. WP just ensures that if the computed formula is true, then the property is true each time its program point is reached.
To prove a loop invariant
property, the WP computation is very
similar, but decomposed into two goals:
Example:
//@ requires n < 100;
int main(int n)
{
int i; int * p;
for (i = 0, p = T; i < n; i++, p++) {
*p = 3;
}
...
}
The following property remove the alarm about the validity of the
(*p)
assignment in the loop:
//@ loop invariant li_1: p == T+i;
Moreover it can be proved by WP:
the establishment has to be proved before entering the loop, but after the initialization part. So the proof obligation is:
T == T + 0
the preservation formula is similar to:
p == T + i ==> p + 1 == T + (i + 1)
Both formula are trivially true.
In the first example in Short Introduction to WP Computation, the z
assignment has
no effect on the WP computation since WP knows that it doesn’t modify
y
value. But it is different when pointers are involved. For
instance:
void f(int * px, int x, int * py, int y)
{
*px = x;
*py = y;
//@ assert a_px: *px == x;
//@ assert a_py: *py == y;
...
}
WP is able to prove a_py
, but not to prove a_px
. The reason is
that it doesn’t know whether the assignment to (*py)
modifies
(*px)
value or not. The a_px
can be proved only with the
precondition:
//@ requires \separated (px, py);
It tells that there is no intersection between (*px)
and
(*py)
locations in the memory.
In the context of adding annotations to remove alarms, except in very simple cases, it is not recommended to use WP when possibly overlapping pointers are involved since it may take some time to provide enough information.
The other problem is when there are some function calls between the property and the statements that makes it true. Remember that WP only work on the source code of the property function, and on the specifications of the called functions.
extern int X;
void g (void);
void f(int x, int y)
{
if (x > y && x > X) {
g ();
//@ assert ax1: x > y;
//@ assert ax2: x > X;
...
}
...
}
WP is able to prove ax1
since there is no way for g
to modify
either x
or y
, but ax2
cannot be proved since g
may
modify X
.
There are two solutions to solve the problem:
add an assigns
property for g
to specify the modified data.
For instance, ax2
is proved when adding:
//@ assigns \nothing;
This is not the preferred method since assigns
are difficult to prove: it
requires to know the modified data for each statement of g
. The computed
dependencies may help to justify the assigns
property, but beware that
this information is context dependent.
add a postcondition about the involved data. For instance:
specifying that X
is not modified by g
:
//@ ensures X == \old (X);
or specifying that X
decrease:
//@ ensures X < \old (X);
Both solutions enable to prove ax2
.
The WP could seem useless if not used in complex cases, but it is not true: even when properties look trivial, it is useful to formally prove them, since it is so easy to make a mistake.
Let us look at an example:
//@ ensures e_res_ok: min <= \result <= max;
int bounds(int min, int x, int max)
{
int res = x;
if (x < min) res = min;
if (x > max) res = max;
return res;
}
The postcondition seems reasonably easy to justify, but WP is unable to prove it. WP computes a proof obligation equivalent to:
if (x > max) then min <= max /\ max <= max
else if (x < min) then min <= min /\ min <= max
else min <= x /\ x <= max
After simplifying the formula, it appears that the information
(min <= max)
is missing, so this postcondition cannot be proved
without a precondition. It then has to be added and checked in every
context where the function is called to ensure that the post-condition
is verified.
The advice here is to use WP only in simple cases because complex cases needs expertise and require a lot of time. But we have seen that even for properties that look trivial, it is better to formally prove them, since it is so easy to make a mistake. Moreover, manual justification of trivial properties may look a little silly.
One must be especially careful when it seems that WP should be able to prove something, and doesn’t, since it may hide a problem somewhere. It is always better to understand if it is really a WP weakness, or something else.
Now that you should know how to analyze an application, it is important to insist on how it is important to put things together and check all the hypotheses.
If there is only one analysis, it is quite easy to check. The results rely on:
assigns
and ensures
properties of the external functions because
they cannot be checked,valid
according the
value analysis or proved by WP.
If there are several analyses, the results of each of one rely on the
same hypotheses than above, but there are more things to check:To be fully verified in the given context all the hypotheses above must have a clear justification for lack of formal verification.
The C++ plugin lets you analyze C++ programs. See the tutorial (Analyzing C++ code) to get started.
This section describes the specifics of this plugin (see TrustInSoft Analyzer++ Specificities) and the support it provides for the GoogleTest framework (see GoogleTest support).
The identifiers in a C++ program are mangled to match C identifiers. The mangling scheme used in TrustInSoft Analyzer is a variation of Itanium mangling. The differences are:
Class, union and enum names are also mangled, even if this is not
required by Itanium. The grammar entry used for these types is
_Z<name>
. As such, the class:
struct Foo {
int x;
}
is translated as:
struct _Z3Foo { int x; }
Local variables and formal parameter names are also mangled, to avoid
shadowing extern "C"
declarations. The grammar entry used for a
local variable is _ZL<unqualified-name>
. As such, the local
variable bar
in:
int main() {
int bar x = 2;
}
is mangled as _ZL3bar
. The keyword this
is not mangled.
The virtual method table and the typeinfo structure for a class
Foo
are mangled as extra static fields named __tis_class_vmt
and __tis_typeinfo
in this class. As such, the class:
struct Foo {
virtual void f() {}
};
leads to the generation of two variables with mangled names
_ZN3Foo15__tis_class_vmtE
and _ZN3Foo14__tis_typeinfoE
.
To make reading the identifiers easier, TrustInSoft Analyzer displays
by default a demangled version of the identifier. In the GUI, the
mangled name can be obtained by right-clicking on an identifier and
select Copy mangled name
.
Signatures are ignored when demangling function names. As such, the assignment in:
void func(int) {}
void
test()
{
void (*ptr)(int) = &func;
}
is displayed as:
void (*ptr)(int);
ptr = & func;
even if the mangled name of func
is _Z4funci
. This can lead to
ambiguity when there are multiple overloads for the named function. A
solution to solve it is to look at its mangled name.
Constructors and destructors are demangled as Ctor
and Dtor
.
If the constructor or destructor is a constructor for a base class and
is different from the constructor for the most derived object, the
suffix Base
is added. If the constructor is a copy constructor,
the suffix C
is added. If the constructor is a move constructor,
the suffix M
is added. Therefore, the demangled name
Foo::CtorC
stands for the copy constructor of the class Foo
.
If the destructor is virtual, it will be demangled as
DeletingDtor
.
The option -cxx-filt
can be used to print the demangled version of
an identifier, as demangled by the analyzer. If the identifier is a
function name its signature will also be printed. For example, the
command tis-analyzer++ -cxx-filt _Z3fooii
displays {foo(int,
int)}
.
When displayed, function return types are preceded by a ->
symbol and are displayed after the formal parameter types. For
example, the instance of the function show
in the following code:
struct Foo {
void f(int) {}
};
template <typename T>
void show(const T&) {}
int
main()
{
show(&Foo::f);
}
is printed as show<{(int) -> void} Foo::*>
.
Template parameter packs are printed enclosed by [
and ]
. As
such, the command tis-analyzer++ -cxx-filt _Z1fIJ3Foo3FooEEvDpRKT_
displays {f<[Foo, Foo]>(const [Foo, Foo]&) -> void}
: f
is a
function templated by a parameter pack, which is instantiated with
Foo, Foo
. Note also that in this case the const
and &
are
applied to the whole pack.
Names displayed in the GUI can be prefixed by ...
. These names are
shortened versions of qualified names. Clicking on this prefix will
display the full mangled or demangled name, depending on the command
line options.
When calling a function, TrustInSoft Analyzer uses different transformations to initialize the function’s arguments depending on the type of the argument. These transformations match Itanium calling convention.
Scalar types are kept as is.
Reference types are translated as pointers to the referenced types. The initialization of an argument of reference type is translated as taking the address of the initializer. If this initialization requires the materialization of a temporary object, this step is done by the caller. For example, with the following original source code:
void f(int &&, int &);
void g() {
int x;
f(2, x);
}
the translated declaration for the function f
is void f(int *a,
int *b)
and the call to f
is translated as:
int x;
int __tis_temporary_0;
__tis_temporary_0 = 2;
f(& __tis_temporary_0,& x);
The passing of a class type depends on whether the class is non-trivial for the purposes of calls. A class type is non-trivial for the purpose of call if:
If the type is non-trivial for the purposes of calls, a variable of
the class type is defined in the caller and the function receives a
pointer to this variable. Such variables are named __tis_arg_##
.
For example, in the following code:
struct Obj {
Obj();
Obj(const Obj &);
};
void f(Obj x, Obj y);
void g() {
f( {}, {} );
}
the translated function f
has the signature:
void f(struct Obj *x, struct Obj *y);
and its call is translated as:
struct Obj __tis_arg;
struct Obj __tis_arg_0;
{
Obj::Ctor(& __tis_arg_0);
Obj::Ctor(& __tis_arg);
}
f(& __tis_arg,& __tis_arg_0);
If the function returns a class that is non-trivial for the purposes
of calls, then it is translated as a function returning void
but
with an additional argument. This argument is a pointer to a variable
in the caller that will receive the function return. If the caller
does not use the function return to initialize a variable, a variable
named __tis_cxx_returnarg_##
is created for this purpose.
For example, with the following original source code:
struct Obj {
Obj();
Obj(const Obj &);
};
Obj f();
void g() {
Obj o = f();
f();
}
the translated function f
has the signature:
void f(struct Obj *__tis_cxx_return)
and the body of the function g
is translated as:
struct Obj o;
f(& o);
{
struct Obj __tis_cxx_returnarg;
f(& __tis_cxx_returnarg);
}
return;
If the type is trivial for the purposes of calls, no transformation is applied and the object is passed by copying its value. For example, with the following original source code:
struct Obj {
Obj();
};
Obj f(Obj o);
void g() {
f( {} );
}
the signature of the translated function f
is
struct Obj f(struct Obj o)
Sometimes, TrustInSoft Analyzer cannot decide if a class is trivial for the purposes of calls in a translation unit. In such cases, it will assume that the type is non-trivial for the purposes of calls and emit a warning like:
[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.
If the user knows that the type is trivial for the purpose of calls, he can use the option -cxx-pass-by-value to force this.
For example, with the following original source code:
struct Foo;
void f(Foo x);
with no particular option set, TrustInSoft Analyzer will produce the following warning and declaration for f:
[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.
void f(struct Foo *x);
with the option -cxx-pass-by-value _Z3Foo, TrustInSoft Analyzer will produce the following declaration for f without warning:
void f(struct Foo x);
Using an incorrect passing style can lead to errors like:
[kernel] user error: Incompatible declaration for f:
different type constructors: struct _Z3Foo * vs. struct Foo
First declaration was at file1.cpp:7
Current declaration is at file2.c:7
or
[kernel] user error: Incompatible declaration for f:
different type constructors: struct Foo vs. void
First declaration was at file.c:7
Current declaration is at file.cpp:7
Methods do not exist in C, and are translated as functions by TrustInSoft Analyzer++. The following additional transformations are applied to non-static methods:
this
argument. Its type is a pointer to the
class enclosing the method. There is an additional const qualifier
if the method is const-qualified.this
argument is
initialized with the address of the calling object.For example, with the following original source code:
struct Obj {
Obj();
static void bar(int x);
void foo(int x) const;
};
void
f(void)
{
Obj o;
o.foo(1);
Obj::bar(0);
}
two function declarations are produced:
void Obj::bar(int x);
void Obj::foo(const struct Obj *this, int x);
and the calls to foo
and bar
are translated as:
Obj::foo(& o,1);
Obj::bar(0);
By default, constructor elision is enabled and TrustInSoft Analyzer++ will omit some calls to copy or move constructors to temporary objects, as allowed by C++ standards from C++11 onwards.
Constructor elision can be disabled with the
-no-cxx-elide-constructors
option.
For example, with the following original source code:
struct Obj {
Obj();
};
Obj f();
void g() {
Obj y = f();
}
when constructor elision is enabled, the call to f
is translated as:
f(& y);
However, when constructor elision is disabled with the option
-no-cxx-elide-constructors
, it is translated as:
struct Obj __tis_temporary_0;
f(& __tis_temporary_0);
Obj::CtorM(& y,& __tis_temporary_0);
In this case, the result of the call to f
is written to the
temporary object __tis_temporary_0
and this temporary object is then
moved to the initialized variable y
.
Virtual method calls translation is separated in three steps:
__virtual_tmp_XXX
, where XXX
is the unqualified name of
the method.this
pointer calling the method, and call
the resolved function pointer using the previous information.nullptr
.As such, the function call_get
in the following code:
struct Foo {
virtual Bar *get() { return nullptr; }
};
Bar *call_get(Foo *f) {
return f->get();
}
is translated as:
struct Bar *call_get(struct Foo *f)
{
struct Bar *__retres;
char *__virtual_return_get;
struct __tis_vmt_entry const *__virtual_tmp_get;
char *tmp_0;
__virtual_tmp_get = f->__tis_pvmt + 1U;
__virtual_return_get = (char *)(*((struct Bar *(*)(struct Foo *))__virtual_tmp_get->method_ptr))
((struct Foo *)((char *)f + __virtual_tmp_get->shift_this));
if (__virtual_return_get) tmp_0 = __virtual_return_get + __virtual_tmp_get->shift_return;
else tmp_0 = __virtual_return_get;
__retres = (struct Bar *)tmp_0;
return __retres;
}
The option -no-cxx-inline-virtual-calls
can be used to replace
this transformation by a call to a generated function named
XXX::__tis_virtual_YYY
, where:
XXX
is the static type of the class containing the method that
was called.YYY
is the unqualified name of the method.With this option, the function call_get
of the example above is
translated as:
struct Bar *call_get(struct Foo *f)
{
struct Bar *tmp;
tmp = Foo::__tis_virtual_get(f);
return tmp;
}
The generated __tis_virtual_
functions keep the states obtained by
the virtual call separated.
TrustInSoft Analyzer uses its own memory layout to represent C++
objects. In order to preserve as much useful information as possible,
the analyzer defines multiple well-typed data structures, and uses
more than one extra pointer field in polymorphic classes. As a result
of this choice, the numeric value of sizeof(Class)
will differ
between the compiled code and the analyzed code.
Objects, being class
or struct
, are translated as C structures.
union
are translated as C unions.
The inline declaration of a static field is translated as a declaration of a global variable with the same qualified name. The out-of-line definition of a static field is translated as a definition of a global variable with the same qualified name.
Non-static fields are translated as fields in the translated structure. The fields are emitted in the source code order.
Empty classes are translated as a structure with one field char
__tis_empty;
. This enforces that the size of an empty class is not
zero.
Non-virtual non-empty base classes are translated as fields in the
derived class. Such fields are named __parent__
followed by the name of
the base class.
For example, with the following original source code:
class Foo {
int x;
};
struct Bar: Foo {
int y;
int z;
};
the structures produced for the class Foo
and Bar
will be:
struct Foo {
int x ;
};
struct Bar {
struct Foo __parent__Foo ;
int y ;
int z ;
};
Non-virtual empty base classes do not appear in the translated C structure. For example, with the following original source code:
class Foo { };
struct Bar: Foo {
int y;
int z;
};
the structure produced for the class Bar
is:
struct Bar {
int y ;
int z ;
};
In this case, a reference to the base Foo
of an object of type
Bar
binds to the original object. In other words, the assertion in
the following program is valid in the model used by TrustInSoft
Analyzer:
class Foo {};
struct Bar: Foo {
int y;
int z;
};
int
main()
{
Bar b;
Foo &f = b;
void *addr_b = static_cast<void *>(&b);
void *addr_f = static_cast<void *>(&f);
//@ assert addr_b == addr_f;
}
If a C++ class is polymorphic, its corresponding C structure contains two additional fields:
struct __tis_typeinfo const *__tis_typeinfo;
holding a
pointer to the type_info
of the most derived object of the
current object.struct __tis_vmt_entry const *__tis_pvmt;
holding a pointer
to the virtual method table of the current object.As an example, the class:
struct Foo {
int x;
virtual void f() {}
};
is translated as:
struct Foo {
struct __tis_typeinfo const *__tis_typeinfo ;
struct __tis_vmt_entry const *__tis_pvmt ;
int x ;
};
These additional fields are set by the constructors of the polymorphic class.
If a class has a virtual base, its translation produces two different C structures: the regular C structure as well as a base version of the class.
The regular structure is used when the object is the most derived object. In this case:
long const * __tis_vbases_ptr;
.
This is an array holding the offset of each virtual base of the object.__tis_vbases_
to
distinguish them from non-virtual bases.The base version of the object has its name prefixed by
__vparent__
and is used when the object is used as a base for
another object. In this case:
__tis_vbases_ptr
of
type long const *
. This is an array to the offset of each
virtual base of the object.As an example the following class:
struct Baz: Bar, virtual Foo {
int z;
};
produces the two classes:
struct Baz {
long const *__tis_vbases_ptr ;
struct __tis_base_Bar __parent__Bar ;
int z ;
struct Foo __vparent__Foo ;
};
struct __tis_base_Baz {
long const *__tis_vbases_ptr ;
struct __tis_base_Bar __parent__Bar ;
int z ;
};
Accessing a virtual base is always done by shifting the address of the
current object with the offset of the virtual base in the
__tis_vbases_ptr
array.
As an example, with the following code:
struct Foo {
int x;
};
struct Bar: virtual Foo {
int y;
};
int
main()
{
Bar bar;
Foo &foo = bar;
}
the body of the main
function is translated as:
int __retres;
struct Baz baz;
struct Foo *foo;
Baz::Ctor(& baz);
foo = (struct Foo *)((char *)(& baz) + *(baz.__tis_vbases_ptr + 0));
__retres = 0;
return __retres;
The virtual base Foo
of a class Baz
has index 0, so the offset
to use to go from Baz
to Foo
is *(baz.__tis_vbases_ptr +
0)
The full layout for objects is the following, in increasing address order:
Pointers to a method X Foo::f(A1, A2, ..., An)
are translated as a C
structure with the following fields:
unsigned long vmt_index ;
X (* __attribute__((__tis_sound_cast__)) ptr)(struct Foo *, A1, A2, ..., An) ;
long shift ;
size_t vmt_shift ;
If Foo::f
is a non-virtual method, then:
ptr
is a pointer to the method called when resolving
the symbol f
in the scope of Foo
. This can be the method
Foo::f
if f
is declared in Foo
or a method of one of the
parent classes of Foo
.shift
is the offset of the base containing the method
f
. If f
is in Foo
, then this is 0, otherwise it is the
offset of the parent class declaring f
.vmt_index
is 0.If Foo::f
is a virtual method, then:
ptr
is the same as if Foo::f
was a non-virtual method.shift
is the same as if Foo::f
was a non-virtual method.vmt_index
is 1 + the index of the method in the virtual
method table of the class containing the final override of f
in
Foo
. This can be different from the index of f
in the virtual
method table of Foo
if the final override of f
is declared in a
parent class of Foo
.Each pointer to member function type produce a different structure type.
The structure type is named __tis_XXXX
, where XXXX
is the mangled
name of the method pointer type.
For example, with the classes:
struct Pack {
char c[1000];
};
struct Bar {
int y;
int f() { return 2; }
};
struct Foo: Pack, Bar {
virtual void g() {}
};
the following statements:
int (Foo::*x)(void) = &Foo::f;
void (Foo::*y)(void) = &Foo::g;
are translated as:
struct __tis_M3FooFivE x;
struct __tis_M3FooFvvE y;
x.vmt_index = 0UL;
x.ptr = (int (*)(struct Foo *))(& Bar::f);
x.shift = 0L - (long)((struct Foo *)((unsigned long)0 - (unsigned long)(& ((struct Foo *)0)->__parent__Bar)));
x.vmt_shift = 0UL;
y.vmt_index = 2UL;
y.ptr = (void (*)(struct Foo *))(& Foo::g);
y.shift = 0L;
y.vmt_shift = ((unsigned long)(& ((struct Foo *)0)->__tis_pvmt);
If a variable v
is initialized at dynamic initialization time, it
is translated as:
v
.void __tis_init_v()
. The content of this function is
the translation of the initializer of v
.All __tis_init_XXX
functions are called by a special function
__tis_globinit
. The __tis_globinit
function is in turn called
at the beginning of the main
function.
As an example, the program:
int id(int x) { return x; }
int x = id(12);
int
main()
{
return x;
}
is translated as:
int x;
void __tis_init_x(void)
{
x = id(12);
return;
}
__attribute__((__tis_throw__)) int id(int x);
int id(int x)
{
return x;
}
__attribute__((__tis_throw__)) int main(void);
int main(void)
{
__tis_globinit();
return x;
}
void __tis_globinit(void)
{
__tis_init_x();
return;
}
A variable with a constant initializer is translated as a C variable with an initializer. The initializer is the value of the C++ constant initializer. As an example, with the following code:
constexpr
int
add_one(int x)
{
return x + 1;
}
const int x = add_one(2);
the definition of the variable x
is translated as:
static int const x = 3;
In some special circumstances, one may need to disable static
initialization semantics described by the C++ standard. It can be done
using the option -no-cxx-evaluate-constexpr
. In this case,
whenever a variable is initialized with a constant initializer that is
not a constant initializer according to the C rules, the
initialization of this variable is done at dynamic initialization time
and uses the initializer as it was written by the user.
Using this option can lead to unsound results. As an example, with the following program:
constexpr int id(int x) { return x; }
extern const int x;
const int y = x;
const int x = id(1);
int
main()
{
int a = y;
int b = x;
//@ assert a == b;
return a == b;
}
tis-analyzer++ --interpreter test.cpp
.tis-analyzer++ --interpreter test.cpp -no-cxx-evaluate-constexpr
.A static local variable x
is translated as a triple of:
__tis_guard_x
.__tis_guard_x
ensuring the initialization of the
static variable is not recursive, followed by a conditional block
doing the initialization of the variable once.The variable __tis_guard_x
can have the following values:
x
has not been initialized yet.x
is being initialized.x
has been initialized.As an example, the following function:
int
main()
{
static Foo f;
return 0;
}
is translated as:
int main::__tis_guard_f;
struct Foo main::f = {.x = 0};
int main(void)
{
int __retres;
tis_ub("Recursive initialization of the static local variable f.",
main::__tis_guard_f != 1);
if (! main::__tis_guard_f) {
main::__tis_guard_f ++;
Foo::Ctor(& main::f);
main::__tis_guard_f ++;
}
__retres = 0;
return __retres;
}
TrustInSoft Analyzer++ introduces several special variables while translating code. This section sumarizes the different name families used for these variables.
__tis_ABI::exc_stack_depth
: how many exceptions are currently raised.__tis_ABI::exc_stack
: all exceptions currently raised.__tis_ABI::caught_stack_depth
: how many exceptions are currently caught.__tis_ABI::caught_stack
: all exceptions currently caught.__tis_unwinding
: whether the program is currently unwinding its stackXXX::__tis_class_vmt
: virtual method table for an object of type
XXX
used as most derived object.XXX::__tis_class_typeinfo
: typeinfo for an object with a most
derived object being of type XXX
XXX::__tis_class_inheritance
: inheritance information for an
object with most derived object being of type XXX
__Ctor_guard
: guard used to check if the lifetime of an object has started.__tis_alloc
: materialization of a space reserved by an allocation function.__tis_arg
: materialization of a function argument.__tis_bind
: temporary variable used to initialize non-reference
structured bindings of arrays.__tis_cast
: result of a dynamic_cast
.__tis_const
: materialization of an function argument that is non-trivial
for the purpose of calls.__tis_compound_literal
: materialization of a C++ temporary used to
initialize a compound literal.__tis_constant_expression
: materialization of a C++ constant expression.__tis_cxx_return_arg
: materialization of the discarded result of a call
that is non-trivial for the purpose of calls.virtual_dtor_tmp
: virtual method table cell used when calling a
virtual destructor.__tis_deleted_value
: address of a deleted value.__tis_dereference
: temporary variable used to dereference a member pointer.__tis_dyncast
: operand of a dynamic_cast
.__tis_exn
: materialization of an object of type std::bad_XXX
being thrown.__tis_gnu_ternary
: shared computation in a GNU ternary expression.__tis_guard
: guard controlling the initialization of a static
local variable.__tis_implicit_value
: materialization of a C++ temporary used to
perform an implicit value initialization.__tis_index
: index used when destroying an array when its lifetime finishes.__tis_initializer_list
: materialization of a C++ temporary used to
build an initializer list.__tis_init_size
: loop variable used to initialize arrays.__tis_lambda_temp
: materialization of a C++ temporary variable in a lambda.__tis_lvalue
: materialization of a C++ lvalue that is not an lvalue in C.__tis_mmp
: method pointer being called.__tis_mmp_init
: temporary variable used to initialize a method pointer.__tis_object_cast
: temporary variable used to compute object inheritance casts.__tis_offset
: index used to destroy array elements as a consequence of calling delete[]
.__tis_placement
: address of an object initialized by a placement new.__tis_relop
: intermediate result when translation relational operators.__tis_temp
: materialization of a C++ temporary variable.__tis_typeid
: address of an object with a polymorphic typeid
being computed.__virtual_return
: the result of the call to a virtual method
returning a pointer.__virtual_this
: this
pointer computed when calling a virtual method.__virtual_tmp
: virtual method table cell used when calling a
virtual method. The name of the called function is added to the end of
the name of the temporaryFunction contracts will be automatically generated for non-static class methods, in order to require validity of the this pointer as a precondition. Copy and move constructors will also grow separated annotations since they are expected to operate on separate objects.
These contracts will be added to user-provided contracts, if any.
Computation of these annotations can be disabled with the
-no-cxx-generate-contracts
option.
You can run your tests that use GoogleTest framework with TrustInSoft Analyzer++ in a few steps.
GoogleTest User’s Guide : https://google.github.io/googletest/
TrustInSoft Analyzer++ provides 2 options in order to activate GoogleTest support:
-gtest
option if you wrote the entry point of your tests yourself.-gtest-main
option (as opposed to the -gtest
option) if your
tests use the default GoogleTest entry point (from gtest_main library).More details about gtest_main can be found at https://google.github.io/googletest/primer.html#writing-the-main-function
Specify your own sources, headers and preprocessing options as you would do for any other analysis (see Prepare the Sources).
By providing the -gtest
option (or the -gtest-main
option) to
TrustInSoft Analyzer++, the analyzer will pull in all GoogleTest source
files and headers for you. Thus you do not have to list them in your analysis
configuration files.
As an example, let us assume that you are testing a software module called
module1
, and that you have gathered your tests that use GoogleTest
framework in a tests
subdirectory.
|-- module1
| |-- include
| | ...
| |-- src
| | |-- component1.cc
| | ...
| |-- tests
| | |-- component1_unittest.cc
| | ...
|-- trustinsoft
| |-- mod1_comp1_unittest.json
| ...
For instance mod1_comp1_unittest.json
would look like
{
"name": "mod1_comp1_unittest",
"prefix_path":"../module1/",
"files": [
"tests/component1_unittest.cc",
"src/component1.cc"
]
}
Note that you do not need to add the gtest \*.cc
files to the "files"
list.
Next run the analysis with both the --interpreter
option (see
Getting Started) and the -gtest
option (or the
-gtest-main
option).
tis-analyzer++ --interpreter -gtest -tis-config-load path/to/<my_unit_test>.json
Note: We provided the option -gtest
directly on the command line in order
to highlight it, but you can also move it to the configuration file
"gtest": true
EXPECT_FATAL_FAILURE
and EXPECT_NONFATAL_FAILURE
)EXPECT_EXIT
)You should make sure by yourself that your tests do not use these features, as they are untested at the moment. Therefore, the analyzer will probably not do what you expect if you use these features, and neither will it specifically warn you that it does not support them.
For more details about the assertion macros provided by GoogleTest visit https://google.github.io/googletest/reference/assertions.html
This section gives some details about how to deal with special features needed to analyze some applications:
Caution
The tis-mkfs
tool is not available in this version.
The tis-mkfs
utility helps to build C files that gives information
about the file system in which the application is
supposed to run. For more information, please refer
to the tis-mkfs Manual.
The default initial environment for the analysis of a program is empty. In order to perform an analysis in a specific environment, it has to be populated from the user code using one of the two methods below.
The user may set some variables
by calling setenv
(or putenv
) in the analysis entry point.
Example:
#include <stdlib.h>
extern int main (int argc, char * argv[]);
int tis_main (int argc, char * argv[]) {
int r;
r = setenv ("USER", "me", 1);
if (r != 0) return 1;
r = setenv ("HOME", "/home/me", 1);
if (r != 0) return 1;
r = setenv ("SHELL", "/bin/sh", 1);
if (r != 0) return 1;
return main (argc, argv);
}
Alternatively, the user may initialize the environ
standard variable.
This variable:
"variable=value"
,NULL
value, if any, will not get accessed).Example:
#include <stdlib.h>
extern int main (int argc, char * argv[]);
extern char **environ;
int tis_main (int argc, char * argv[]) {
char *custom_environ[] = {
"USER=me",
"HOME=/home/me",
"SHELL=/bin/sh",
NULL
};
environ = custom_environ;
return main (argc, argv);
}
Using one of the two methods above to initialize the environment, the following program can be analyzed:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char * argv[]) {
char * user = getenv ("USER");
if (user)
printf ("USER value is '%s'\n", user);
return 0;
}
The command line would be:
$ tis-analyzer tis_main.c main.c -val -main tis_main -slevel 10
And the following output can be observed:
...
USER value is 'me'
...
The initial size of the array pointed by the environ
variable
is controlled by the TIS_NB_INIT_ENV_ELEMENTS
macro.
Its default value is set to 100,
but may be changed by the user (using the -D
option as usual)
to avoid reallocations if the application needs more than 100 variables.
Moreover, to avoid losing precision, a local slevel
is used inside
the implementation of the environment related functions.
It is controlled by the TIS_ENV_INTERNAL_SLEVEL
macro.
Its default value is already very large,
but it can be increased by the user if it is still too low for a specific usage.
Without any option, the analysis stops on recursive function calls
and lets the user decide how to handle them by choosing either
-val-clone-on-recursive-calls
or -val-ignore-recursive-calls
.
The -val-clone-on-recursive-calls
option tells the analyzer
to process the calls to recursive functions
exactly as it they were calls to normal functions.
The function body is copied
and the copy is renamed with the prefix __tis_rec_<n>
where <n>
is the depth of the recursion.
This means that the recursive call is analyzed precisely.
This works up to the limit defined with the option
-val-clone-on-recursive-calls-max-depth
.
When the limit is reached
(or when the -val-clone-on-recursive-calls
is not set),
the contract of the recursive function is used.
Usually an assigns clause is enough to have the expected semantics.
If no contract is provided, the analyzer generates a simple one
but it may be an incorrect contract:
so it is very recommended to provide a contract to analyze such cases.
The -val-ignore-recursive-calls
option tells the analyzer
to use the contract of the function to handle the recursive calls.
The contract may be generated as-if the max-depth option was reached
as explained above.
Note that when the --interpreter
option is used
the -val-clone-on-recursive-calls
is automatically set.
TrustInSoft Analyzer provides two ways to detect memory leaks.
The first way is to use the built-in function tis_check_leak
to print the list of the memory blocks that are allocated
but not referenced by any other memory block anymore
at the program point where the built-in is called.
Example (leak.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdlib.h>
#include <tis_builtin.h>
char * f(int v) {
char *p = malloc(v);
return p;
}
int main() {
f(42);
tis_check_leak();
return 0;
}
|
The program above can be analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations leak.c
we get the following result:
tests/val_examples/leak.c:5:[value] allocating variable __malloc_f_l5 of type char [42]
stack: malloc :: tests/val_examples/leak.c:5 (included from tests/val_examples/leak_test.c) <-
f :: tests/val_examples/leak.c:10 (included from tests/val_examples/leak_test.c) <-
main
tests/val_examples/leak.c:11:[value] warning: memory leak detected for {__malloc_f_l5}
Indeed, 42 bytes are allocated at line 5 in f
,
but since the pointer returned by f
is lost in main
,
a memory leak is detected at line 11.
In addition, the analyzer also prints the list of possibly leaked memory blocks (memory blocks that might not be referenced by any other memory block anymore), as shown in the following example.
Example (leak_weak.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdlib.h>
#include <tis_builtin.h>
char * f() {
char *p = malloc(1);
char *q = malloc(1);
return tis_interval(0, 1) ? p : q;
}
int main() {
char *r = f();
tis_check_leak();
return 0;
}
|
When the program is analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations leak_weak.c
we get the following result:
$ tis-analyzer -val -val-show-allocations leak_weak.c
[...]
leak_weak.c:5:[value] allocating variable __malloc_f_l5 of type char
stack: malloc :: leak_weak.c:5 <- f :: leak_weak.c:11 <- main
leak_weak.c:6:[value] allocating variable __malloc_f_l6 of type char
stack: malloc :: leak_weak.c:6 <- f :: leak_weak.c:11 <- main
[value] using specification for function tis_interval
leak_weak.c:12:[value] warning: possible memory leak detected for {__malloc_f_l5, __malloc_f_l6}
Indeed, when tis_check_leak
is called at line 12, the value of variable
r
is { NULL ; &__malloc_f_l5 ; &__malloc_f_l6 }
, which means the value
of r
can either be NULL
, or the address of the memory block allocated
at line 5, or the address of the memory block allocated at line 6. Thus, the
memory blocks allocated at line 5 and line 6 might be leaked.
In order to improve the precision, the program can be analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
in which case, the analyzer propagates the states where r
points to
NULL
, __malloc_f_l5
and __malloc_f_l6
separately, thus, we get
the following analysis result:
$ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
[...]
leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l5}
leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l6}
It shows that in one path, __malloc_f_l5
is leaked, and in another path
__malloc_f_l6
is leaked.
The second way of detecting memory leaks requires the user to be able to identify two points in the target program such that when the execution reaches the second point, all the memory blocks that have been allocated since the execution was at the first point have been freed. The procedure is then to insert a call to a builtin that lists all the dynamically allocated blocks in each of the two points. If the lists printed by the two builtin calls at the two points match, it means that every block that was allocated after the first point was freed before the second point was reached.
In “interpreter mode”, in which the analyzer follows a single execution path,
the tis_show_allocated
builtin can be used to print the lists of allocated
blocks.
Example (leak_interpreter.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include <stdlib.h>
#include <tis_builtin.h>
void f (void) {
char * p1 = malloc (10);
char * p2 = malloc (10);
char * p3 = malloc (10);
char * p4 = malloc (10);
p1 = p4;
p4 = p2;
free (p2);
free (p1);
}
int main (void) {
char * p = malloc (10);
tis_show_allocated ();
/* all the memory blocks allocated in function f should be freed */
f ();
tis_show_allocated ();
free(p);
}
|
When the program above is analyzed with the following command line:
$ tis-analyzer --interpreter -val leak_interpreter.c
we get the following result:
$ tis-analyzer --interpreter -val leak_interpreter.c
[...]
leak_interpreter.c:16:[value] remaining allocated variables:
__malloc_main_l15
leak_interpreter.c:19:[value] remaining allocated variables:
__malloc_main_l15, __malloc_f_l5, __malloc_f_l7
The second call to tis_show_allocated
at line 19 shows that after the
function call f ()
at line 18, two more memory block respectively
allocated at line 5 and line 7 exist in the memory state since the first
call to tis_show_allocated
. Thus, we know that function f
causes
a memory leak.
In “analyzer mode”, the first point may be visited by the analyzer several
times, for different memory states, corresponding to different execution paths
in the program. The memory states that reach the second point should be
matched to the memory state at the first point that they correspond to.
For this purpose, the tis_allocated_and_id
and tis_id
builtins
can be used. The tis_id
builtin allows to give a unique “id” to each
memory state and tis_allocated_and_id
builtin can be used to print, in
addition to the list of allocated blocks, the value of the “id” so as to
allow states to be identified.
Example (tis_show_allocated_and_id.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #include <stdlib.h>
#include <tis_builtin.h>
int main(){
char *t[7];
int n = tis_interval_split(0, 1);
// "before" point
unsigned long long my_id = tis_id();
tis_show_allocated_and_id("before", my_id);
t[n] = malloc(1);
if (!t[n]) goto leave1;
t[n+1] = malloc(1);
if (!t[n+1]) goto leave2;
t[n][0] = 'a';
t[n+1][0] = 'b';
leave2:
free(t[n]);
leave1:
// "after" point
tis_show_allocated_and_id("after", my_id);
}
|
When the program above is analyzed with the following command line:
$ tis-analyzer -val -slevel 10 -val-show-allocations tis_show_allocated_and_id.c
we get the following result:
$ tis-analyzer -val -slevel 100 -val-show-allocations tis_show_allocated_and_id.c
[...]
tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:13:[value] allocating variable __malloc_main_l13 of type char
stack: malloc :: tis_show_allocated_and_id.c:13 <- main
tis_show_allocated_and_id.c:15:[value] allocating variable __malloc_main_l15 of type char
stack: malloc :: tis_show_allocated_and_id.c:15 <- main
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:__malloc_main_l15
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:__malloc_main_l15
The statement at line 7 causes two separate states in which variable n
respectively has value 0 and 1. Then, the statement at line 10 assigns
variable my_id
to 0 in one state and to 1 in the other state.
The call to tis_show_allocated_and_id("before", "my_id)
at line 11 shows
that in both “before” states where my_id
is 0 and 1, there is no allocated
memory block. The memory allocation statements at line 13 and line 15 cause
three different paths: in one path, the allocation at line 13 fails and
goto leave1
is taken at line 14, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, there is
no allocated memory block in both “after” states; in one path, the allocation
succeeds at line 13, but fails at line 15 and goto leave2
at line 16 is
taken, then, the memory allocated at line 13 is freed at line 20, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, there is
no allocated memory block left in both “after” states; in the other path, both
the allocation at line 13 and at line 15 succeed, but only the memory block
allocated at line 13 is freed at line 20, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, the
memory block allocated at line 15 remains in both “after” states, that is a
memory leak problem. From this example, we can also see that there might be
several “after” states for each “before” state: in this case each “after”
state much be checked against its corresponding “before” state for memory
leaks.
TrustInSoft Analyzer faithfully emulates the hardware features of the targeted platform: endianness, size of integer types, and alignment constraints.
The command -machdep help
lists the supported architectures, and the command
-machdep verbose
shows a brief summary of the main characteristics of each supported
architecture.
$ tis-analyzer -machdep help
[kernel] supported machines are: aarch64 aarch64eb apple_ppc_32 arm_eabi armeb_eabi
gcc_aarch64 gcc_aarch64eb gcc_arm_eabi gcc_armeb_eabi gcc_mips_64
gcc_mips_n32 gcc_mips_o32 gcc_mipsel_64 gcc_mipsel_n32 gcc_mipsel_o32
gcc_ppc_32 gcc_ppc_64 gcc_rv32ifdq gcc_rv64ifdq gcc_sparc_32 gcc_sparc_64
gcc_x86_16 gcc_x86_16_huge gcc_x86_32 gcc_x86_64 mips_64 mips_n32 mips_o32
mipsel_64 mipsel_n32 mipsel_o32 ppc_32 ppc_64 rv32ifdq rv64ifdq sparc_32
sparc_64 x86_16 x86_16_huge x86_32 x86_64 x86_win32 x86_win64
(default is x86_32).
These supported platforms are:
ppc_32
, but forcing the “char” type to be signed, as done by
the Apple toolchain (as opposed to the default PowerPC use of unsigned
char type), and allowing gcc language extensions.Unless otherwise specified in the list above, the characteristics of the fundamental types are:
With the exception of apple_ppc_32
, x86_win32
and x86_win64
,
all these machdeps may be specified with
the gcc_
prefix, in which case gcc language extensions are allowed, and
the __int128 integer type is available on 64-bit machdeps.
The endianness of the supported architecture is specified as:
To switch to another architecture quickly, one of the following options may be used:
-16
for gcc_x86_16
-32
for gcc_x86_32
-64
for gcc_x86_64
$ tis-analyzer -64 ...
When the targeted architecture is not supported out-of-the-box, a new architecture corresponding to a specific target may be defined and dynamically loaded by TrustInSoft Analyzer.
A new plug-in is defined with the values of the types for the targeted
architecture. Create a new file custom_machdep.ml
with the
following content and edit the necessary values:
open Cil_types
let mach =
{ version = "foo";
compiler = "bar";
sizeof_short = 2; (* __SIZEOF_SHORT *)
sizeof_int = 4; (* __SIZEOF_INT *)
sizeof_long = 4; (* __SIZEOF_LONG *)
sizeof_longlong = 8; (* __SIZEOF_LONGLONG *)
sizeof_int128 = 0;
sizeof_ptr = 4; (* related to __INTPTR_T *)
sizeof_float = 4;
sizeof_double = 8;
sizeof_longdouble = 12;
sizeof_void = 1;
sizeof_fun = 1;
size_t = "unsigned long"; (* __SIZE_T *)
char16_t = "unsigned short";
char32_t = "unsigned int";
wchar_t = "int"; (* __WCHAR_T *)
ptrdiff_t = "int"; (* __PTRDIFF_T *)
max_align_t = "long double"; (* __MAX_ALIGN_T *)
alignof_short = 2;
alignof_int = 4;
alignof_long = 4;
alignof_longlong = 4;
alignof_int128 = 0;
alignof_ptr = 4;
alignof_float = 4;
alignof_double = 4;
alignof_longdouble = 4;
alignof_str = 1;
alignof_fun = 1;
alignof_aligned= 16;
pack_max = 16;
char_is_unsigned = false;
char_bit = 8; (* __CHAR_BIT *)
const_string_literals = true;
little_endian = false; (* __TIS_BYTE_ORDER *)
has__builtin_va_list = true;
__thread_is_keyword = true;
has_int128 = false; }
let () =
Stage.run_after_loading_stage
(fun () ->
Core.result "Registering machdep 'mach' as 'custom'";
ignore (Machdeps.register_machdep
~short_name:"custom" ~cpp_target_options:[] mach))
Define a new header containing the values of the types.
Create a new file __fc_custom_machdep.h
with the following content:
/* skeleton of a real custom machdep header. */
#ifndef __TIS_MACHDEP
#define __TIS_MACHDEP
#ifdef __TIS_MACHDEP_CUSTOM
// __CHAR_UNSIGNED must match mach.char_is_unsigned
#undef __CHAR_UNSIGNED
#define __WORDSIZE 32
// __CHAR_BIT must match mach.char_bit
#define __CHAR_BIT 8
// __SIZEOF_SHORT must match mach.sizeof_short
#define __SIZEOF_SHORT 2
// __SIZEOF_INT must match mach.sizeof_int
#define __SIZEOF_INT 4
// __SIZEOF_LONG must match mach.sizeof_long
#define __SIZEOF_LONG 4
// __SIZEOF_LONGLONG must match mach.sizeof_longlong
#define __SIZEOF_LONGLONG 8
// __TIS_BYTE_ORDER must match mach.little_endian
#define __TIS_BYTE_ORDER __BIG_ENDIAN
#define __TIS_SCHAR_MIN (-128)
#define __TIS_SCHAR_MAX 127
#define __TIS_UCHAR_MAX 255
#define __TIS_SHRT_MIN (-32768)
#define __TIS_SHRT_MAX 32767
#define __TIS_USHRT_MAX 65535
#define __TIS_INT_MIN (-__TIS_INT_MAX - 1)
#define __TIS_INT_MAX 8388607
#define __TIS_UINT_MAX 16777216
#define __TIS_LONG_MIN (-__TIS_LONG_MAX -1L)
#define __TIS_LONG_MAX 2147483647L
#define __TIS_ULONG_MAX 4294967295UL
#define __TIS_LLONG_MIN (-__TIS_LLONG_MAX -1LL)
#define __TIS_LLONG_MAX 9223372036854775807LL
#define __TIS_ULLONG_MAX 18446744073709551615ULL
#define __INT8_T signed char
#define __TIS_INT8_MIN __TIS_SCHAR_MIN
#define __TIS_INT8_MAX __TIS_SCHAR_MAX
#define __UINT8_T unsigned char
#define __TIS_UINT8_MAX __TIS_UCHAR_MAX
#define __INT_LEAST8_T __INT8_T
#define __TIS_INTLEAST8_MIN __TIS_INT8_MIN
#define __TIS_INTLEAST8_MAX __TIS_INT8_MAX
#define __UINT_LEAST8_T __UINT8_T
#define __TIS_UINTLEAST8_MAX __TIS_UINT8_MAX
#define __INT_FAST8_T __INT8_T
#define __TIS_INTFAST8_MIN __TIS_INT8_MIN
#define __TIS_INTFAST8_MAX __TIS_INT8_MAX
#define __UINT_FAST8_T __UINT8_T
#define __TIS_UINTFAST8_MAX __TIS_UINT8_MAX
#define __INT16_T signed short
#define __TIS_INT16_MIN __TIS_SHRT_MIN
#define __TIS_INT16_MAX __TIS_SHRT_MAX
#define __UINT16_T unsigned short
#define __TIS_UINT16_MAX __TIS_USHRT_MAX
#define __INT_LEAST16_T __INT16_T
#define __TIS_INTLEAST16_MIN __TIS_INT16_MIN
#define __TIS_INTLEAST16_MAX __TIS_INT16_MAX
#define __UINT_LEAST16_T __UINT16_T
#define __TIS_UINTLEAST16_MAX __TIS_UINT16_MAX
#define __INT_FAST16_T __INT16_T
#define __TIS_INTFAST16_MIN __TIS_INT16_MIN
#define __TIS_INTFAST16_MAX __TIS_INT16_MAX
#define __UINT_FAST16_T __UINT16_T
#define __TIS_UINTFAST16_MAX __TIS_UINT16_MAX
#define __INT32_T signed int
#define __TIS_INT32_MIN __TIS_INT_MIN
#define __TIS_INT32_MAX __TIS_INT_MAX
#define __UINT32_T unsigned int
#define __TIS_UINT32_MAX __TIS_UINT_MAX
#define __INT_LEAST32_T __INT32_T
#define __TIS_INTLEAST32_MIN __TIS_INT32_MIN
#define __TIS_INTLEAST32_MAX __TIS_INT32_MAX
#define __UINT_LEAST32_T __UINT32_T
#define __TIS_UINTLEAST32_MAX __TIS_UINT32_MAX
#define __INT_FAST32_T __INT32_T
#define __TIS_INTFAST32_MIN __TIS_INT32_MIN
#define __TIS_INTFAST32_MAX __TIS_INT32_MAX
#define __UINT_FAST32_T __UINT32_T
#define __TIS_UINTFAST32_MAX __TIS_UINT32_MAX
#define __INT64_T signed long long
#define __TIS_INT64_MIN __TIS_LLONG_MIN
#define __TIS_INT64_MAX __TIS_LLONG_MAX
#define __UINT64_T unsigned long long
#define __TIS_UINT64_MAX __TIS_ULLONG_MAX
#define __INT_LEAST64_T __INT64_T
#define __TIS_INTLEAST64_MIN __TIS_INT64_MIN
#define __TIS_INTLEAST64_MAX __TIS_INT64_MAX
#define __UINT_LEAST64_T __UINT64_T
#define __TIS_UINTLEAST64_MAX __TIS_UINT64_MAX
#define __INT_FAST64_T __INT64_T
#define __TIS_INTFAST64_MIN __TIS_INT64_MIN
#define __TIS_INTFAST64_MAX __TIS_INT64_MAX
#define __UINT_FAST64_T __UINT64_T
#define __TIS_UINTFAST64_MAX __TIS_UINT64_MAX
#define __INT_MAX_T __INT64_T
#define __TIS_INTMAX_MIN __TIS_INT64_MIN
#define __TIS_INTMAX_MAX __TIS_INT64_MAX
#define __UINT_MAX_T __UINT64_T
#define __TIS_UINTMAX_MAX __TIS_UINT64_MAX
// __INTPTR_T must match mach.sizeof_ptr
#define __INTPTR_T __INT32_T
#define __TIS_INTPTR_MIN __TIS_INT32_MIN
#define __TIS_INTPTR_MAX __TIS_INT32_MAX
#define __UINTPTR_T __UINT32_T
#define __TIS_UINTPTR_MAX __TIS_UINT32_MAX
// __PTRDIFF_T must match mach.ptrdiff_t
#define __PTRDIFF_T int
// __MAX_ALIGN_T must match mach.max_align_t
#define __MAX_ALIGN_T long double
// __SIZE_T must match mach.size_t
#define __SIZE_T unsigned int
#define __SSIZE_T int
#define __TIS_PTRDIFF_MIN __TIS_INT_MIN
#define __TIS_PTRDIFF_MAX __TIS_INT_MAX
#define __TIS_SIZE_MAX __TIS_UINT_MAX
#define __TIS_SSIZE_MAX __TIS_INT_MAX
// __WCHAR_T must match mach.wchar_t
#define __WCHAR_T int
#define __TIS_WCHAR_MIN __TIS_INT_MIN
#define __TIS_WCHAR_MAX __TIS_INT_MAX
#define __WINT_T long long int
#define __TIS_WINT_MIN __TIS_LLONG_MIN
#define __TIS_WINT_MAX __TIS_LLONG_MAX
#define __WCTRANS_T long long int
#define __WCTYPE_T long long int
#define __SIG_ATOMIC_T volatile int
#define __TIS_SIG_ATOMIC_MIN __TIS_INT_MIN
#define __TIS_SIG_ATOMIC_MAX __TIS_INT_MAX
// Common machine specific values (PATH_MAX, errno values, etc) for Linux
// platforms, usually applicable anywhere else.
#include "__fc_machdep_linux_gcc_shared.h"
#else
#error "I'm supposed to be included with __TIS_MACHDEP_CUSTOM macro defined"
#endif
#endif
NB: The previous content is not close to a real architecture but is given as an example of possibilities.
Warning
It is important for the data defined in the two files above to have compatible values.
The new architecture may now be tested:
#include "limits.h"
int main(void)
{
return INT_MAX;
}
To analyze it with TrustInSoft Analyzer, load the plug-in that defines
the custom machdep and then add the option -D __TIS_MACHDEP_CUSTOM
.
$ tis-analyzer -load-script custom_machdep.ml -I . -D __TIS_MACHDEP_CUSTOM -machdep custom -val test.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
__retres ∈ {8388607}
The command-line options -all-rounding-modes-constants
and -all-rounding-modes
change the interpretation
of floating-point constants and computations.
The default, when both options are unset, is to assume a strict IEEE 754
platform where float
is mapped to the IEEE 754 binary32 format,
double` is mapped to binary64,
and the rounding mode is never changed from its nearest-even default.
The following deviations from strict IEEE 754 behavior can be taken into account
in TrustInSoft Analyzer:
FLT_EVAL_METHOD
is defined to a value other than
0 by the compiler.FLT_EVAL_METHOD
to 2 or 0, but actually produce floating-point results
inconsistent with these settings.#pragma STDC
FP_CONTRACT ON
. Some C compilers are taking the path of enabling this
by default (ref: https://reviews.llvm.org/D24481 ).The user who applies TrustInSoft Analyzer to C programs containing significant floating-point computations is invited to open a ticket in the support site with details of the compiler and architecture.
An unfortunate choice in the C89 standard has led, when the C99 standard was published, to incompatibilities for the types assigned to integer constants. The problem is compounded by some compilers’ eagerness to provide extended integer types. The type of integer constants can, though the usual arithmetic conversions, influence the results of integer computations.
By default, TrustInSoft Analyzer types integer constants the same way a
C99-compliant compiler would, and invites the user to pick a choice
if a signed integer constant that cannot be represented as a long
long
is encountered in the target program. Since long long
is
at least 64-bit, this does not happen unless an integer constant in
the program is larger than 9,223,372,036,854,775,807.
The command-line option -integer-constants c89strict
can be used
to select the C89 behavior. On an ILP32 compiler following the C89
standard, the following program returns 1
with res = 1
,
whereas it returns 2
with res = 2
when compiled with a C99
compiler:
int main(void)
{
int res;
if (-3000000000 < 0)
res = 1;
else
res = 2;
return res;
}
$ tis-analyzer -val integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
res ∈ {1}
$ tis-analyzer -val -integer-constant c89strict integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
res ∈ {2}
Caution
The option -val-use-spec
is not available in this
version. To analyze assembly code, it is necessary to model the
assembly code with C code (when necessary).
The assembly code is accepted by the tool, but most of the analyzers are not able to understand it.
The assembly statements appear in the statements.csv
information
file (see Information about the Statements) with the keyword asm
in the
kind column. It can be the case that these statements are not reachable
in the studied context, so they can be safely ignored. If they are
reachable, the value analysis ignores them, but it warns when it has do
so with is message:
[value] warning: assuming assembly code has no effects in function xxx
It is fine if the assembly statements have no effect, or if the effect can be ignored, but it is most likely not the case!
The simplest way to handle assembly code in the value analysis is to
write a specification for the functions that enclose such statements and
use the option -val-use-spec
to use them instead of the function
bodies.
If a function is composed of other statements besides the assembly ones, WP can be used to check the specification. Because WP do not understand assembler either, the assembly statements have to be specified using a statement contracts. These ones can’t be verified against assembler, but are used to verify the remaining of the function.
Example:
//@ requires pre: n >= 0; ensures post: \result == n * 3 * x + 10;
int asm_fun(int x, int n)
{
int res = 0;
/*@ loop assigns i, res;
loop invariant l1: i <= n;
loop invariant l2: res == i * 3 * x;
*/
for (int i = 0; i < n; i++) {
int y;
//@ assigns y; ensures y == 3 * x;
asm ("leal (%1,%1,2), %0" // x + x * 2 --> == 3 * x
: "=r" (y)
: "r" (x)
);
res += y;
}
return res + 10;
}
The specification of asm_fun
has to be used during value analysis
using the -val-use-spec
options. But this specification can be
verified using WP assuming the assembler statement properties are
correct, which is a smaller hypothesis.
Caution
The option -absolute-valid-range <min-max>
is not available
in this version.
By default, reading or writing data located at some memory addresses is only valid if that part of the memory has been allocated, otherwise, the operation is invalid, and raises an alarm. But in some cases, for instance to have access to hardware resources, one needs to read or write to absolute addresses, without having to allocate that part of the memory.
Example:
#define SZ 32
#define BASE_ADDR ((char*)0x400000)
void main (void) {
for (int i = 0; i < SZ; i++)
BASE_ADDR[i] = 0;
}
Analyzing such code raises alarms about invalid memory accesses:
void main(void)
{
int i;
i = 0;
while (i < 32) {
/*@ assert
Value: pointer_arithmetic: \inside_object((void *)((char *)0x400000));
*/
*((char *)0x400000 + i) = (char)0;
i ++;
}
return;
}
There are two solutions to this problem:
-absolute-valid-range
option,Both solutions are explained below, but the second one is to be preferred when possible.
Using the option -absolute-valid-range <min-max>
to tell the tool
that all absolute addresses between min
and max
are valid
addresses. The min
and max
values are integers in decimal,
hexadecimal (with 0x
or 0X
prefix), octal (0o
) or binary
(0b
) notation and must fit in 64 bits. Note that using inverted
bounds (min > max) resets all absolute addresses to be invalid from
the command line.
On the previous example, adding -absolute-valid-range 0x400000-0x40001F
to the command line solves the problem.
The idea is to declare an array that represents the memory chunk and to use it instead of the absolute addresses:
#define SZ 32
#ifdef __TRUSTINSOFT_ANALYZER__
extern char tis_array[SZ];
#define BASE_ADDR tis_array
#else
#define BASE_ADDR ((char*)0x400000)
#endif
void main (void) {
for (int i = 0; i < SZ; i++)
BASE_ADDR[i] = 0;
}
Notice that, because we do not want to modify the original code,
this modification should be protected by a test on the
__TRUSTINSOFT_ANALYZER__
macro which is automatically set
by tis-analyzer
. Since the array is only declared (and not defined)
the analyzer makes no hypotheses on the initial values of its elements.
This solution is better because it ensures that the addresses in this range
are always computed in a controlled way
(through the BASE_ADDR
macro for instance in this example)
and not by an unrelated computation.
Moreover, if there are several such ranges, using different arrays ensures that the zones are separated, i.e. that an address in the second zone cannot be computed with some pointer arithmetic on a pointer in the first zone. Having such hypotheses may make the code easier to analyze.
Most of the time, the behavior of a program must be independent of the value of the variable addresses, i.e. of the location of the variables in the memory. But in some cases, it matters, and it is then useful to inform the analyzer of some constraints about the addresses.
In some low level applications, instructions are given to the linker
to put symbols at a precise address in the memory.
To provide this information to the analyzer,
the tis_address
attribute can be used in the variable declaration:
tis_address.c
¶#include <stdint.h>
#ifdef WITH_TIS_ADDR
int X __attribute__((tis_address(0xABCD)));
int Y __attribute__((tis_address("[0x00200000 .. 0x00200FFF]")));
int Z __attribute__((tis_address("[0x00300000 .. 0x00300FF0],0%16")));
#else
int X, Y, Z;
#endif
int main (void) {
int r = 0;
if ((uintptr_t)&X < (uintptr_t)&Y)
r++;
if ((uintptr_t)&Y < (uintptr_t)&Z)
r++;
if (((uintptr_t)&Z & 0xF) == 0)
r++;
return r;
}
While the X
variable is specified to be precisely
at the memory address 0xABCD
,
Y
and Z
are just said to be somewhere in the given range
(with Z
having an alignment constraint).
With those specifications,
the comparisons of the integer values of the variable addresses
then give a precise result.
When the program is analyzed, we get the following result:
$ tis-analyzer -D WITH_TIS_ADDR -64 -val -val-print tis_address.c
[value] Values at end of function main:
r ∈ {3}
To avoid modifying the source files,
the -absolute-address
option may be also used.
So for instance, on the example above, the follow command
is equivalent to the previous one:
$ tis-analyzer -val -val-print tis_address.c \
-absolute-address X:0xABCD \
-absolute-address Y:[0x00200000..0x00200FFF] \
-absolute-address=Z:"[0x00300000 .. 0x00300FF0]\,0%16"
The same information can also be specified through a configuration file:
tis_address.config.json
¶{
"files": [ "tis_address.c" ],
"absolute-address": {
"X": "0xABCD",
"Y": "[0x00200000 .. 0x00200FFF]",
"Z": "[0x00300000 .. 0x00300FF0],0%16"
}
}
And the equivalent command is:
$ tis-analyzer -val -val-print -tis-config-load tis_address.config.json
Warning
Warning: the tis_address
attribute cannot be used in C++ files.
It may be provided either in a separate C file,
attached to the corresponding variable declarations,
or through the -absolute-address
option on the command line
or in a configuration file.
Beside specifying the alignment for some variables,
a global alignment constraint can be added with the -address-alignment
option.
By default, the address size of a variable depends on the architecture chosen
by the user, such as, 64-bit pointers for x86_64, and 32-bit pointers for x86_32.
However, some low-level code may sometimes for example take advantage of the
fact that some code is guaranteed to be within the first 4GiB of memory, thus, it stores
the addresses of some variables or functions into 32-bit integers. In order
to analyze such programs precisely, the tis_address
attribute
can be used to specify the address range.
Example:
addr_32bits.c
¶#include <stdint.h>
int main (){
char c __attribute__((tis_address("[1..0xFFFFFFFF]")));
uint64_t high = (uint64_t)&c >> 32;
uint64_t low = (uint64_t)&c & 0xFFFFFFFF;
char *p =(char *) (high << 32UL | low);
*p = 42;
}
With the tis_address
attribute,
the address of variable c
is specified to be between 1 and 0xFFFFFFFF
.
Note that 0 is not included because the address of the variable must compare
different to NULL.
When the program is analyzed, we get the following result:
$ tis-analyzer -64 -val -val-print addr_32bits.c
[value] Values at end of function main:
c ∈ {42}
high ∈ {0}
low ∈ {{ (unsigned long)&c }}
p ∈ {{ &c }}
__retres ∈ {0}
The analyzer makes use of the fact that &c
is a 32-bit pointer,
and computes precise values for variables high
, low
and
p
, and thus, variable c
is set to 42 with the statement at
line 8.
To analyze an application
that uses dynamic loading features from <dlfcn.h>
such as dlopen
, dlsym
, etc.,
some specific information has to be provided by the user.
For instance, let’s consider the following program to be analyzed:
use-dlopen.c
to analyze:¶#include <stdio.h>
#include <dlfcn.h>
int main (void) {
void * handle = dlopen("some_lib.so", RTLD_LAZY);
if ( ! handle) {
fprintf(stderr, "%s\n", dlerror());
return 1;
}
void (*f_process) (void);
f_process = (void (*)(void)) dlsym(handle, "process");
char * error = dlerror();
if (error != NULL) {
fprintf(stderr, "%s\n", error);
return 1;
}
f_process();
dlclose(handle);
return 0;
}
Since it is deterministic, the interpreter mode is used, but it would be similar for a larger analysis that uses the analyzer mode. So, it can be analyzed with the following command:
$ tis-analyzer --interpreter use-dlopen.c
The trace shows that, in order to be able to load the function with dlsym
,
a stub_dlsym
function has to be provided:
[TIS LIBC STUBS]: stub_dlsym error: For a more accurate analysis, override this function "stub_dlsym" with your own function
dlsym error: unable to load process symbol
Such a stub may for instance look like:
use-dlopen-stubs.c
to provide stub_dlsym
:¶#include <string.h>
#include <dlfcn.h>
void process (void);
void *stub_dlsym(const char *filename, const char * fname) {
void * pf = NULL;
if (0 == strcmp (filename, "some_lib.so")) {
if (0 == strcmp (fname, "process")) {
pf = &process;
}
}
return pf;
}
Now, the command becomes:
$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c
There is a warning about the process
function:
stub_dlsym
is provided, but not the library:¶tests/val_examples/use-dlopen-stubs.c:8:[kernel] warning: Neither code nor specification for function process, generating default assigns from the prototype
Of course, this is because the source code of the loaded library,
that is supposed to hold the process
function,
has not been provided to the analyzer.
Let’s add a use-dlopen-plugin.c
dummy file holding a process
function:
use-dlopen-plugin.c
to provide the process
function:¶#include <stdio.h>
void process (void) {
printf ("Hello from the 'process' function.");
}
Warning
Limitation: since the source files of the loaded library are analyzed
together with the main source files,
the constructor functions of dynamic libraries
are called during the main program startup
whereas they should be called when dlopen
is called.
Now, the command becomes:
$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c use-dlopen-plugin.c
The application is now analyzed as expected:
Hello from the 'process' function.
Warning
If some function names are used in both the application and the loaded library, some renaming may be needed.
Some compilers have extensions with non standard keywords. One easy way to remove these keywords from the source code is to define them as empty macros. For instance:
-cpp-extra-args="-Dinterrupt=''"
Beware that it is up to you to check if removing these keywords might change the verification relevance.
Caution
The Mthread plug-in is not available in this version.
The Mthread plug-in makes it possible to verify multi-thread programs.
Because it also uses the same value analysis, it provides the same alarm detection, but it takes into account all the possible concurrent behaviors of the program by analyzing all the possible interleavings between all threads. As before, this represent an over-approximation of the possible behaviors of the program.
Moreover, Mthread can provide an over-approximation of the memory zones that are accessed concurrently by more than one thread. For each zone and thread, Mthread also returns the program points at which the zone is accessed, whether the zone is read or is written, and the callstack that lead to the statement.
Using the plug-in requires to add a stubbed version of the used
concurrency library to the analyzed source files. For some concurrency
libraries, this file is provided with the tool (currently pthread
,
VxWorks
and Win32
).
Please ask for more information if needed.
Caution
The Strict Aliasing plug-in is not available in this version.
The Strict Aliasing plug-in detects the violation of the strict aliasing rule as defined in the C99 and the C11 standards.
The references taken from the C11 standard for the strict aliasing rule are:
The strict aliasing analysis is currently in beta.
The strict aliasing analysis is available using the parameter -sa
when
starting an analysis with TrustInSoft Analyzer. Using this option automatically
launches the value analysis.
If a violation of the strict aliasing rule is detected during an analysis, a warning is displayed. However, this warning does not stop the analysis.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 | int foo(int *p, float *q)
{
*p = 42;
*q = 1.337;
return *p;
}
int main(void)
{
int x;
return foo(&x, (float *)&x);
}
|
Given the previous C file foo.c, the strict aliasing analysis can be launched using the following command:
$ tis-analyzer -sa foo.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
foo.c:4:[sa] warning: The pointer q has type float *. It violates strict aliasing rules by
accessing a cell with effective type int.
Callstack: foo :: t.c:11 <- main
[value] done for function main
A violation of the strict aliasing rule is detected by the analyzer. The analyzer provides details about the violation: the pointer has the float type and the cell has the int type. However, the types are incompatible.
Several options exist to parametrize the strict aliasing analysis.
-sa-strict-enum
¶Default Value: | not set by default |
---|---|
Opposite: | -sa-no-strict-enum |
By default, the strict aliasing analysis uses the integer representation of
the enum type. This enables to use a pointer type to the integer
representation and to use the pointer to access the enum cell. The
-sa-strict-enum
option limits the default behavior: it only enables an
access to an enum cell by using a pointer to the same enum type. For
example:
1 2 3 4 5 6 7 8 9 | enum E { a, b };
int main(void)
{
enum E e = a, *p = &e;
*(int *)p = 42;
*p = b;
return e;
}
|
The access at line 6 is accepted by default by the strict aliasing analysis because the example uses a pointer to a correct integer representation, as shown by the following output:
$ tis-analyzer -sa enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
When using the -sa-strict-enum
option, the strict aliasing analysis detects a
violation at line 6, because it does not accept the integer representation.
$ tis-analyzer -sa -sa-strict-enum enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
enum.c:6:[sa] warning: The pointer (int *)p has type int *. It violates strict aliasing rules by
accessing a cell with effective type enum E.
Callstack: main
[value] done for function main
-sa-strict-struct
¶Default Value: | not set by default |
---|---|
Opposite: | -sa-no-strict-struct |
When taking the address of a structure member, the strict aliasing analysis keeps track of the structure and the member, in order to check future pointer use. By default, the analyzer enables to access a memory location with an effective type of structure member by a pointer having the same type as the member. For example:
1 2 3 4 5 6 7 8 9 | struct s { int a; };
int main(void)
{
struct s s = { 0 };
int *p = &s.a;
*p = 42;
return s.a;
}
|
The access at line 7 is enabled by the analyzer because the pointer p has the same type as the member a of the structure s.
$ tis-analyzer -sa struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
When using the -sa-strict-struct
option, this access is signaled as
non-conformant because the member must be accessed with the same effective type
(i.e. accessed by a pointer to the whole structure only).
$ tis-analyzer -sa -sa-strict-struct struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
struct.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
a cell with effective type (struct s).a[int].
Callstack: main
[value] done for function main
-sa-strict-union
¶Default Value: | set by default |
---|---|
Opposite: | sa-no-strict-union |
When taking the address of a union member, the strict aliasing analysis keeps information about the whole union, and not only the referenced member. The analyzer limits the access to a memory location that have a union type by a pointer to the same union type.
1 2 3 4 5 6 7 8 9 | union u { int a; };
int main(void)
{
union u u = { 0 };
int *p = &u.a;
*p = 42;
return u.a;
}
|
The access at line 7 is not valid according to the analyzer because the pointer p has not the expected union u type, even though the union includes a member having the same type as pointed by p.
$ tis-analyzer -sa union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
union.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
a cell with effective type (union u)[int].
Callstack: main
[value] done for function main
When using the opposite option -sa-no-strict-union
, the access is enabled,
because the union u includes a member of int type.
$ tis-analyzer -sa -sa-no-strict-union union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
The C volatile
keyword is a qualifier that may be precised when declaring a
variable. Such a qualifier tells the compiler that the value of a volatile
variable may change without any action being taken by the code the compiler
finds nearby. In particular, a variable qualified as volatile can be modified
outside the program. This means that even if it has just been assigned, its
value is unpredictable.
The value analysis handles volatile variables, but WP does not at the moment.
Warning
WP MUST NOT be used on functions using volatile data.
-volatile-globals
¶By default, TrustInSoft Analyzer treats as volatile only those variables that are
declared as such in a program. The -volatile-globals
option allows to
indicate a set of global variables that the value analysis has to consider
volatile despite them not being declared as such in the program.
Providing the option with the wording NULL forces the value analysis to
consider a given absolute memory range, typically specified by the option
-absolute-valid-range
, as volatile.
Our references from the C11 standard here are:
va_list
type¶The type va_list
, declared in the <stdarg.h>
header, should be,
according to the C11 standard, a complete object type. There are some
limitations on TrustInSoft Analyzer’s capabilities of handling this type and its
correct use on the basic level.
Objects of the va_list
type are handled correctly only if they appear as
local variables, formal arguments, etc. We do not support global va_list
variables, arrays of va_list
objects nor va_list
as the type of fields
of complex types (i.e. unions or structures). A fatal error will occur if
va_list
objects are used in such a way (however if they are just declared
and then not actually used there is no error).
It is not entirely clear (in the C11 standard) if performing assignments on
variables of va_list
type (i.e. ap1 = ap2;
) is permitted or not.
However, as both gcc and clang compilers refuse to compile such operations,
we assume that they are not permitted and thus we do not handle them.
TrustInSoft Analyzer does verify though if va_list
assignments appear in the code:
if they do, the program will not be rejected, even though the behavior is
undefined.
va_list
type¶Casting from other data types to the va_list
type is implicitly forbidden
and this rule is enforced by TrustInSoft Analyzer: an appropriate error will occur if
such a cast is encountered and the program will be rejected.
va_list
type¶Casting from the va_list
type to other data types is also forbidden, though
we do not enforce it: programs containing such cases are incorrect, but they
will not be rejected by TrustInSoft Analyzer.
va_list
objects to other functions¶The va_list
objects can be passed as arguments to other functions. However,
if the called function invokes the va_arg
macro on this object, its value in
the calling function becomes indeterminate (i.e. it cannot be used anymore for
va_arg
in the calling function). This rule does not apply if va_list
is
passed by a pointer. See subsection 7.16.1 (of the C11 standard).
We do not enforce this rule in TrustInSoft Analyzer. All va_list
objects
passed to other functions are treated as if they were passed by pointer. Each
invocation of the va_arg
macro on such a va_list
object will simply
return the subsequent argument, without considering where the previous
va_arg
invocations happened.
This approach is similar to what is implemented in gcc and clang.
va_list
objects from functions¶It is not completely clear (in the C11 standard), but it seems that functions
returning va_list
objects are allowed. However, as both gcc and clang
compilers refuse to compile such functions, we assume that using them is not
permitted. TrustInSoft Analyzer rejects programs which declare functions returning
va_list
objects.
Some undefined behavior cases enumerated in the C11 standard, see section J.1 (Undefined behavior) of appendix J (Portability issues), are not verified by TrustInSoft Analyzer.
va_arg
on va_list
objects passed to other functions¶The macro va_arg is invoked using the parameter ap that was passed to a function that invoked the macro va_arg with the same parameter (7.16).
Status: Not verified at all (as stated above).
A macro definition of va_start, va_arg, va_copy, or va_end is suppressed in order to access an actual function, or the program defines an external identifier with the name va_copy or va_end (7.16.1).
Status: Not verified at all.
va_arg
macro¶The type parameter to the va_arg macro is not such that a pointer to an object of that type can be obtained simply by postfixing a * (7.16.1.1).
Status: Not verified at all.
va_arg
macro¶The parameter parmN of a va_start macro is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions (7.16.1.4).
Status:
TrustInSoft Analyzer does not handle properly the old-style variadic macros from the
obsolete <varargs.h>
header. It does not reject programs containing such
macros, but upon encountering them the behavior is undefined.
Many C standard functions declared in <math.h>
are automatically bounded to TrustInSoft Analyzer built-ins
(see the Built-ins section).
For some functions, like sin
for instance,
two built-ins are provided: tis_sin
and
tis_sin_precise
. So there is no default choice because
choosing one or the other depend very much on how the functions are used
in the application.
Typically, the tis_sin_precise
built-in is more precise
for analyses that do not involve widening. Using it otherwise
may lead to less precise result than the naive version tis_sin
.
In order to analyze programs that contain floating-point numbers, instead of using the default floating-point abstract values, it is possible to use more precise ones in which all the possible floating-point values can be represented, including infinities and NaNs.
This feature is available by setting the environment variable TIS_ADVANCED_FLOAT=1
.
An abstract floating-point value can be represented either as a single
floating-point value {0.125}
or as distinct intervals for negative and
positive values and flags indicating the presence of infinities, NaNs and zeros.
For example, the abstract value {±0.,+∞,NaN} ∪ [-8.2 .. -1.3] ∪ {1.35}
indicates the presence of negative and positive zeros, positive infinity,
all the NaNs, a negative interval of finite floating-point values and a
single positive finite floating-point value.
It is possible to build an abstract floating-point value with the following builtins:
double tis_abstract_double(double __nmin, double __nmax, double __pmin,
double __pmax, int __nzero, int __pzero, int __ninf, int __pinf, int __nan);
float tis_abstract_float(float __nmin, float __nmax, float __pmin,
float __pmax, int __nzero, int __pzero, int __ninf, int __pinf, int __nan);
where __nmin
and __nmax
are the bounds of the negative interval,
and __nmin > __nmax
indicates an empty negative interval.
The bounds __pmin
and __pmax
work similarly for a positive interval.
The other parameters are flags indicating the presence of zeros, infinities and NaNs.
In C++ standard, enumerations can have trap representations, specifically, the C++ standard allows an enumerated type to behave as if the underlying representation type is an integer type of minimum number of bits needed to represent all the enumeration constructors: unsigned if all constructors are not less than 0, and signed if at least one constructor is less than 0. Consider the following code (test.cpp):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | enum e { ZERO, ONE, TWO};
int r;
int main(void) {
enum e e1 = (enum e) 3;
enum e e2 = (enum e) 4;
switch(e2){
case 0: r = 0; break;
case 1: r = 1; break;
case 2: r = 2; break;
case 3: r = 3; break;
case 4: r = 4; break;
}
std::printf("r = %d", r);
}
|
The underlying type representation of enum e
is an unsigned integer type of
two bits, that is, according to C++ standard, variable e1
can only has
values between 0 and 3, thus, the statement at line 7 that casts 4 to a value of
type enum e
and assigns it to variable e2
is undefined behavior, while
the cast from 3 to enum e
at line 6 is allowed according to the C++ standard.
When compiling the program with:
$ g++ test.cpp
variable r
is set to 4 in the end of the program. However, when the program
is compiled with:
$ g++ -fstrict-enums test.cpp
the value of r
is 0 by the end of the program. The option -fstrict-enums
allows the compiler to optimize under the assumption that a value of an enumerated
type can only be one of the values of the enumeration as defined in the C++ standard.
This assumption is not satisfied due to the cast at line 6.
In C++11, an enumeration can also be declared with a specific underlying
representation type. In this case, the enumeration type behaves as if the
specified type. For example, when declaring the type enum e
as
enum e : char { ZERO, ONE, TWO};
in the example above, the program has
no undefined behavior.
In order to make sure an enumeration always has a value allowed,
TrustInSoft Analyzer provides the option -enum-restriction
with three
option values: loose
, strict
and stricter
that respectively
corresponds to the C standard, the C++ standard, and a stricter assumption
that requires a enumeration value only being values in the range of the minimum
and maximum enumeration constructor.
Consider the following code:
1 2 3 4 5 6 7 | enum E { X = 1, Y = 13} x;
int main(void){
x = (enum E)2;
x = (enum E)14;
x = (enum E)16;
}
|
The default option value loose
corresponds to the enumeration behaviors in C.
It means that any value representable in the underlying integer type of an enumeration
is allowed, although the choice of underlying integer type to represent
an enumeration is implementation-defined. TrustInSoft Analyzer uses the same algorithm
as GCC for choosing the underlying type of an enumeration based on the values of the
enumeration members. Thus, the program will be successfully analyzed
by the analyzer and no alarm will be raised when -enum-restriction loose
is set.
The option value strict
corresponds to the C++ standard,
that is, any value of the narrowest integer type that can represent all the enumeration
constructors is allowed. For example, the enum E
contains only the value
1 and 13, which can be represented by an unsigned integer of 4 bits,
that is, only values between 0 and 15 are allowed. Therefore, an alarm
will be raise at the statement on line 6:
$ tis-analyzer -enum-restriction strict -val enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] The Analysis can be stopped with: kill -USR1 816
enum.c:6:[kernel] warning: Reading a wrong representation ({16}) for type enum E.
stack: main
enum.c:6:[kernel] warning: Unclassified alarm: assert
\warning("representation of value of type enum E is [0..15]");
stack: main
[value] done for function main
The option value stricter
only accepts values in the range of minimum and
maximum enumeration constructor. For example, the minimum and maximum constructor
of the enum E
are 1 and 13 respectively, which means, only values between
0 and 13 are allowed. Therefore, an alarm will be raised at the statement on
line 5:
$ tis-analyzer -enum-restriction stricter -val enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] The Analysis can be stopped with: kill -USR1 6977
enum.c:5:[kernel] warning: Reading a wrong representation ({14}) for type enum E.
stack: main
enum.c:5:[kernel] warning: Unclassified alarm: assert
\warning("representation of value of type enum E is [1..13]");
stack: main
Some standard library functions, like void *memcpy(void *, const
void *, size_t)
, have a pointer p
and a size n
as some of
their parameters, and provide operations on the first n
bytes data
of the memory area pointed to by the pointer p
. The C standard
mandates that p
is a pointer inside “the address space of the
program” (this is clause 7.1.4:1
in the C17
standard, and the same
clause exists in earlier versions of the standard):
The clause 7.22.5:1
re-emphasizes that this applies to comparison functions
that take a size, even if the passed size is 0
:
size_t nmemb
specifies the length of
the array for a function, nmemb
can have the value zero on a call to
that function; the comparison function is not called, a search finds
no matching element, and sorting performs no rearrangement. Pointer
arguments on such a call shall still have valid values, as described
in 7.1.4
.The clause 7.24.1:1
similarly re-emphasizes that 7.1.4
applies to string
manipulation functions “unless explicitly stated otherwise”.
These restrictions are gratuitous for any reasonable modern
architecture, where if the program has been able to build a pointer
without invoking undefined behavior, passing it to a standard function
that does nothing with it is never a problem. However, some compilers
have taken to optimizing C
programs on the basis of them respecting
these clauses, so that the generated binary behaves unexpectedly
if the program being compiled does not respect the rules above.
For instance, consider the program below:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
__attribute__((noinline)) //imagine the function is in another compilation unit
void f(char *p, int n) {
if (n > 10) abort();
memcpy(p,"0123456789",n);
if (p == NULL) {
printf("f was called with NULL\n");
}
else {
printf("f was called with a non-null pointer\n");
printf("The value of p:%p that is:0x%llx\n", p, (unsigned long long)p);
}
}
int main(void) {
f(NULL, 0);
}
GCC 10.2
compiles this C program into a binary that,
when executed, displays the following:
f was called with a non-null pointer
The value of p:(nil) that is:0x0
(Compiler Explorer link: https://gcc.godbolt.org/z/jYsKEW)
This is not a GCC
bug, and incidentally this behavior has existed
for several versions. GCC
is “in the right” because the program
passes NULL
to memcpy
.
When this program is analyzed with TrustInSoft Analyzer using the default
settings, the analyzer warns that the program passes NULL
to memcpy
and that this may cause the program to behave differently than expected.
By default, TrustInSoft Analyzer requires the pointer p
passed to
a standard function to be valid to read or write even
when the size n
is zero.
Example:
#include <string.h>
int main(){
char c1, c2;
char *p = &c1;
char *q = &c2;
memcpy(p+1, q+1, 0);
}
Analyzing such a program raises alarms about invalid memory write or read:
int main(void)
{
int __retres;
char c1;
char c2;
char *p;
char *q;
p = & c1;
q = & c2;
/*@ assert Value: logic_mem_access: \valid(p+1); */
/*@ assert Value: logic_mem_access: \valid_read(q+1); */
memcpy((void *)(p + 1),(void const *)(q + 1),(size_t)0);
__retres = 0;
__tis_globfini();
return __retres;
}
If you know that your C compiler will not play such tricks on you despite
being allowed to by the standard (hopefully your C compiler explicitly documents
that it won’t play this trick, otherwise it only hasn’t gotten round to it yet),
the option -pointer-arguments-to-std-functions
can be
used. The option -pointer-arguments-to-std-functions
accepts three values: strict
,
nonnull
, or loose
.
The option value strict
, which is the default, means that only pointers valid
for reading or writing are allowed as arguments for standard library functions.
The option value nonnull
means that a null pointer is not allowed, but any other
pointer that is valid to compute is allowed when size n
is zero. This includes
“one-past” pointers.
Example:
/* run.config
OPT: @TIS_CMD@ -pointer-arguments-to-std-functions nonnull
*/
#include <string.h>
int main(){
char c1, c2;
char *p = &c1;
char *q = &c2;
memcpy(p+1, q+1, 0);
memcpy(NULL, q, 0);
}
Analyzing the code with -pointer-arguments-to-std-functions nonnull
only raises alarms
about null pointers:
int main(void)
{
int __retres;
char c1;
char c2;
char *p;
char *q;
p = & c1;
q = & c2;
memcpy((void *)(p + 1),(void const *)(q + 1),(size_t)0);
/*@ assert Value: unclassified: \warning("(void *)0 != (void*) 0"); */
memcpy((void *)0,(void const *)q,(size_t)0);
__retres = 0;
__tis_globfini();
return __retres;
}
The option loose
means that the analyzer will not check the validity of pointer arguments
when size n
is zero, thus no alarms will be raised about invalid pointer parameters
when size n
is zero.
When the TIS_DETERMINISTIC_LIBC
macro is set (use
-cpp-extra-args=”-DTIS_DETERMINISTIC_LIBC”),
TrustInSoft Analyzer provides deterministic implementations of some libc and
POSIX functions. This can prove useful when analyzing the behavior of a program
on concrete values, for instance in interpreter mode, to keep the analyses
deterministic.
POSIX.1-2001 defines four functions to convert between packed network bytes (big endian) and proper integer values on the host:
htonl()
: host to network, 32 bits (long)htons()
: host to network, 16 bits (short)ntohl()
: network to host, 32 bits (long)ntohs()
: network to host, 16 bits (short)TrustInSoft Analyzer assumes that the hosts uses a common representation of
integer (byte-wise little or big endian), which means that htonl()
is
equivalent to ntohl()
and that htons()
is equivalent to ntohs()
.
This is generalized with the POSIX.1-202x header endian.h
, which
supports converting between host integers values and either little (“le”) or
big endian (“be”).
#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
#include <endian.h>
uint16_t htobe16(uint16_t host_16bits);
uint16_t htole16(uint16_t host_16bits);
uint16_t be16toh(uint16_t big_endian_16bits);
uint16_t le16toh(uint16_t little_endian_16bits);
uint32_t htobe32(uint32_t host_32bits);
uint32_t htole32(uint32_t host_32bits);
uint32_t be32toh(uint32_t big_endian_32bits);
uint32_t le32toh(uint32_t little_endian_32bits);
uint64_t htobe64(uint64_t host_64bits);
uint64_t htole64(uint64_t host_64bits);
uint64_t be64toh(uint64_t big_endian_64bits);
uint64_t le64toh(uint64_t little_endian_64bits);
The user may also provide their own implementation for any of these functions, which will override the one from TrustInSoft Analyzer.
Note
htonl()
, htons()
, ntohl()
and ntohs()
are equivalent to
htobe32()
, htobe16()
, be32toh()
and be16toh()
respectivelyThe C89 standard defines rand()
, which returns a pseudo-random number, and
srand()
, which sets the random seed to a user-provided value.
POSIX.1-2001 says that the period of rand()
should be a value of at least
232 but we assume a weaker generator for stronger guarantees (e.g.
higher chance of collision). TrustInSoft’s implementation of rand()
and
srand()
is the one given in C89 with 32768
for INT_MAX
.
The POSIX standard adds the reentrant function rand_r()
where a pointer to
the seed is passed as an argument and modified.
The POSIX functions random()
, srandom()
, initstate()
and
setstate()
give much stronger guarantees and refer to a specific generator.
The standard talks of a “non-linear additive feedback random-number generator”,
which actually refers to a Lagged Fibonacci generator.
The user may also provide their own implementation for any of these functions, which will override the one from TrustInSoft Analyzer.
#include <stdlib.h>
int rand(void);
void srand(unsigned int seed);
int rand_r(unsigned *seedp);
char *initstate(unsigned int seed, char *state, size_t n);
long random(void);
char *setstate(char *state);
void srandom(unsigned int seed);
The C functions time()
(defined in C89) and gettimeofday()
(defined in
POSIX.1-2001) provide information about the current time. When using these
functions in the analyzer with the TIS_DETERMINISTIC_LIBC
macro defined,
time starts at TIS_INIT_TIME_SEC
seconds [1] and TIS_INIT_TIME_USEC
microseconds after the Unix epoch [2]. Every time either time()
or
gettimeofday()
is called, the clock progresses forward by
TIS_INCR_TIME_SEC
seconds and TIS_INCR_TIME_USEC
microseconds. The
values of these macros can be overridden with the -cpp-extra-args option:
TIS_INIT_TIME_SEC
can be any valid value for long int
;TIS_INIT_TIME_USEC
can be any integer between 0
and 999999
(included);TIS_INCR_TIME_SEC
should be a non-negative valid value for long int
;TIS_INCR_TIME_USEC
should be a non-negative valid value for long int
(can be larger than 999999
).Warning
Calling time()
or gettimeofday()
twice in quick
succession, or calling one right after another will still show a difference
of TIS_INCR_TIME_SEC
seconds and TIS_INCR_TIME_USEC
microseconds.
The user may also provide their own implementation for either of these functions, which will override the one from TrustInSoft Analyzer.
#define TIS_INIT_TIME_SEC 1596453215L // 2020-08-03T13:13:35+02:00
#define TIS_INIT_TIME_USEC 306099L
#define TIS_INCR_TIME_SEC 1L
#define TIS_INCR_TIME_USEC 0L
#include <time.h>
time_t time(time_t* timer);
#include <sys/time.h>
struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
struct timezone {
int tz_minuteswest; /* minutes west of Greenwich */
int tz_dsttime; /* type of DST correction */
};
int gettimeofday(struct timeval *tv, struct timezone *tz);
Note
gettimeofday()
is marked as obsolete by POSIX.1-2008.gettimeofday()
is not
specified when passing a non-NULL
value for the tz
argument
(TrustInSoft Analyzer zeroes the pointed structure).[1] | ignoring leap seconds |
[2] | 1 January 1970 at midnight UTC (1970-01-01T00:00:00Z) |
This section gives some ideas about how verify properties that come from the design stage of the application:
The dependency analysis’ results are useful to confirm that C code that has been written modifies the variables expected by the design documents, and computes the new values of these variables from the inputs expected by the design documents.
The dependency analysis runs on top of an already-configured value analysis. The dependencies shown are valid for the inputs of the target function described by the analysis driver.
The dependency analysis is activated by adding the options
-graphdeps -deps
to a command line that already launches a value analysis.
Consider the function f
, for which an analysis driver has
already been written to perform a value analysis:
#include <tis_builtin.h>
int i, o;
void f(void) {
o = 2 * i;
}
// Analysis driver:
int main(void) {
// set up the inputs:
i = tis_interval(-100, 100);
// call the target function
f();
}
The command line to obtain the dependencies in the log is:
$ tis-analyzer -val -slevel 10 -graphdeps -deps direct.c
Inside the function f
,
the value of the variable i
is used in arithmetic operations
to compute the value of the output variable o
. This is shown as:
...
[from] Function f:
o FROM direct: i
...
One way for a C function to not behave in a deterministic manner is to invoke Undefined Behavior. In this case, the code generated by the C compiler may behave erratically, and there are no guarantees about what can happen when the function is called. The value analysis is used to guarantee that Undefined Behavior does not happen for any of the inputs delimited by the analysis driver.
Beyond this basic guarantee, when the dependency analysis says that
the above function f
computes o FROM direct: i
, it means that
every time the function f
is called, it places in the variable o
a
value that is computed from nothing else than the value of the
variable i
at the time of the call. If the function is called twice
with the same value in i
, then the same value will be put in
o
. Every other variable of the program could have a different value
and the function f
would still compute the same value in o
.
If the developer had mistakenly written f
in a way such that
it had an internal state, and behaved correctly the first time it is
called, but not the second one, this internal state would appear in
the dependencies computed by the dependency analysis. (Here “mistakenly”
means “contrary to the specification documents written during the
design phase”. There is nothing wrong with some functions having an
internal state, as long as they were designed so.)
Similarly, if the developer had mistakenly made f
’s behavior
depend on some configuration option, this would appear in the
computed dependencies.
While trying out the example above, you may have noticed that
TrustInSoft Analyzer says that the function main
sets the variables
i
and o
, but that the values to which these variables are set
depend on a mysterious tis_entropy_source
variable:
...
[from] Function main:
tis_entropy_source FROM direct: tis_entropy_source (and SELF)
i FROM direct: tis_entropy_source
o FROM direct: tis_entropy_source
\result FROM \nothing
...
Generally speaking, it is not interesting to look at the results of
the dependency analysis for analysis drivers, such as main
here.
But to explain what is shown here, the function main
sets i
to an
arbitrary value between -100 and 100, by calling tis_interval(-100, 100)
,
so that the analysis results will be valid for this range of
inputs. It would be an unwanted assumption to assume that the
function tis_interval
always produces the same value when applied to
the same input bounds. Instead, the function tis_interval
is
modeled as having an internal state tis_entropy_source
, that it
updates at every call, and that the result is computed from.
In the function main
, the variable i
receives the result of a call
to tis_interval
, so for the dependency analysis, it sets i
to a
value that depends on tis_entropy_source
. The function main
also
calls f
, which modifies o
. Looking at the function f
, the new
value of o
is computed deterministically from the value of i
, but
looking at main
, the value of o
is set to a value that depends on
tis_entropy_source
, since main
calls f
with a value in i
that has been set to something depending on tis_entropy_source
.
#include <tis_builtin.h>
int i, o;
void f(int c, int *p) {
if (c)
o = 2 * (*p);
else
o = 0;
}
// Analysis driver:
int main(void) {
// set up the inputs:
i = tis_interval(-100, 100);
int c = tis_interval(0, 1);
// call the target function
f(c, &i);
}
$ tis-analyzer -val -slevel 10 -graphdeps -deps indirect.c
...
[from] Function f:
o FROM indirect: c; p; direct: i
...
In this example, o
still depend on i
,
but it does through p
(which points on i
)
and only if c
is different from 0.
In this case, p
and c
are said to be indirect
dependencies
because their values are involved in the value assigned to o
,
but the value of o
is not computed directly from them.
Often, whether a dependency towards an input is direct or indirect is
a matter of programming style. It is possible to tell TrustInSoft
Analyzer to display the results of the dependency analysis without
distinguishing direct and indirect dependencies with the option
-no-show-indirect-deps
. On the above example, using this option gives:
$ tis-analyzer -val -slevel 10 -graphdeps -deps indirect.c -no-show-indirect-deps
...
[from] Function f:
o FROM i; c; p
...
#include <tis_builtin.h>
int x, y, z;
void f(int c) {
if (c) x = y + z;
}
int main(void) {
y = tis_interval(0, 100);
z = tis_interval(-50, 50);
int c = tis_interval(0, 1);
f(c);
}
$ tis-analyzer -val -slevel 10 -graphdeps -deps self.c
...
[from] Function f:
x FROM indirect: c; direct: y; z (and SELF)
...
In this example, x
has direct and indirect dependencies as before,
but it is also said to have a (and SELF)
dependency.
This means that x
may not be updated when executing f
.
In this example, this happens when the input value of c
is 0.
#include <tis_builtin.h>
void do_peak_things (void);
void do_offpeak_things (void);
enum period { PEAK, OFFPEAK, NBPERIOD } current_period;
unsigned total;
unsigned total_during_period[NBPERIOD];
unsigned previous_t;
void update_totals(unsigned t, unsigned instant_pow) {
unsigned duration = t - previous_t;
unsigned energy = instant_pow * duration;
total = total + energy;
switch (current_period) {
case PEAK:
do_peak_things();
break;
case OFFPEAK:
do_offpeak_things();
break;
}
total_during_period[current_period] = total_during_period[current_period] + energy;
previous_t = t;
}
// Analysis driver
int main(void) {
// set up the inputs:
tis_make_unknown(&previous_t, sizeof previous_t);
current_period = tis_interval(0, NBPERIOD - 1);
unsigned t;
tis_make_unknown(&t, sizeof t);
unsigned instant_pow;
tis_make_unknown(&instant_pow, sizeof instant_pow);
// call the target function
update_totals(t, instant_pow);
}
The command line to obtain the dependencies in the log is still:
$ tis-analyzer -val -slevel 10 -graphdeps -deps update_totals.c
Notice that, since the dependency analysis is based on the results
of the value analysis, options that make the value analysis more precise,
such as -slevel
, can also result in more precise dependency results.
The relevant results are:
...
[from] Function update_totals:
total FROM direct: total; previous_t; t; instant_pow
total_during_period[0]
FROM indirect: current_period; direct: total_during_period[0];
previous_t; t; instant_pow (and SELF)
[1]
FROM indirect: current_period; direct: total_during_period[1];
previous_t; t; instant_pow (and SELF)
previous_t FROM direct: t
...
When implementing an application, one may wish to check that some functional properties from the specification are indeed satisfied by the code.
The ACSL Properties chapter already explained that ACSL properties may be added to a project, and Prove Annotations with WP shows how the WP plug-in may be used to prove them.
Formal verification does not have to be complete to be interesting. The Checking the dependencies chapter already showed how input/output specifications can be checked, even without using WP. But functional properties may also be checked even if they represent partial specifications.
For instance, the partial specification of a function may just be:
//@ ensures f_ex: x >= 0 ==> \result >= 0;
int f (int x);
It does not say what the result is,
only that it is positive when the input x
is positive,
and it says nothing about the case where the input is negative.
It may require to add some properties to the function that are called by f
,
but only those that are strictly needed to prove the property.
Partial specifications may be used to check a specific goal. For instance, in our PolarSSL verification kit, we only wrote the specifications of the subcomponents that were necessary in order to prove that there were no Undefined Behaviors in the system as a whole.
Proving total specifications requires more work. The properties must then describe the complete behavior of each function.
The ACSL by Example case study is a nice example of a collection of C functions and data types whose behavior has been formally specified with ACSL and formally verified with WP.
This manual aim is to explain how to use the TrustInSoft Analyzer GUI.
Using the GUI is very similar to running the analyzer in batch mode as explained in TrustInSoft Analyzer Manual.
To launch the GUI, TrustInSoft Analyzer must be called as usual
by adding the -gui
option.
For the TrustInSoft Kernel, the right option is -server
:
$ tis-analyzer -gui <options..> <source and preprocessed files..>
or:
$ tis-kernel <options..> <source and preprocessed files..> -server
When the message "Ready <date> port 8080"
appears, the GUI can be used by
opening a browser at the address localhost:8080
. The server port can be
chosen with the option -server-port
, by default the port is 8080.
See -server-help
for more information about options for this GUI.
Only one session of a GUI is allowed on a port. However, multiple TrustInSoft Analyzer instances can be launched with different port and opened with different browser pages at the same time.
The supported browsers are:
By default, the TrustInSoft Analyzer GUI can only be accessed through a web browser running on the host running the server. There are two options to allow accesses from other hosts:
-server-expose
, then the GUI will be available
at the address http://<ip_address>:<port>
where <ip_address>
is an IP address of the computer where TrustInSoft Analyzer is running
and port
the port used for the wanted TrustInSoft Analyzer instance
(the one given to the option -server-port
).A reverse-proxy can be installed in order to open the access of the TrustInSoft Analyzer GUI through the network.
Here is an example of a installation and configuration of a reverse-proxy:
sudo apt-get install -y nginx
{ cat <<'EOF'
server {
listen 80;
location ~ /(\d+)$ {
return 301 http://$host$request_uri/;
}
location ~ /(?<tisport>\d+)/ {
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_read_timeout 600s;
rewrite /\d+/(.*) /$1 break;
proxy_pass http://127.0.0.1:$tisport;
}
}
EOF
} | sudo sh -c 'cat > /etc/nginx/sites-available/tis-analyzer.nginx'
Then the reverse-proxy can be enabled with:
sudo ln -s /etc/nginx/sites-available/tis-analyzer.nginx /etc/nginx/sites-enabled/
sudo nginx -s reload
After this configuration, the TrustInSoft Analyzer GUI will be available at
the address http://<id_address>/<port>
where <ip_address>
is
an IP address of the computer where TrustInSoft Analyzer is running and
port
the port used for the wanted TrustInSoft Analyzer instance (the one
given to the option -server-port
).
The reverse-proxy can be disabled with the following commands:
sudo rm -f /etc/nginx/sites-enabled/tis-analyzer.nginx
sudo nginx -s reload
Note
With an advanced reverse-proxy configuration, the HTTPS protocol and an authentication can be added to strengthen the security to access TrustInSoft Analyzer GUI.
All animations of the GUI can be disabled by adding the parameter
graphics=low
in the URL used to access the GUI, such as:
http://localhost:12345?graphics=low
Disabling animations can lower the CPU charge with some browsers for computers with low capacity. For instance, this parameter can also be useful if the GUI is accessed through an SSH tunnel with X11 forward enabled.
After starting the server as explained before, the GUI can be opened in a browser.
The GUI is composed of several zones, as shown by the figure above:
The GUI has a lot of widgets and information to display. Modes allow to have pre-selected sets of widgets to focus on a particular work.
The different modes are:
Pre-processing
, focused on the parsing phase of the analysis.Value Analysis
, focused on the Value Analysis settings and statistics.Exploration
, focused on the exploration the analyzed program
with the analysis results if any.Note that any widget can be displayed in any mode with the Workspace menu.
The central Interactive Code Panel shows the normalized code of the program. The normalization is done by the analyzer and preserves the semantic of the program as well as the undefined behaviors existing in the source code.
The normalized code is segmented in Globals. Each tab in the Interactive Code Panel is a global of the program.
The normalized code in each tab is also interactive and reacts to the user clicks:
Moreover, the margin at the left of the interactive code is used to put Status bullets.
Six kinds of elements can be clicked in the interactive code:
x
, y << 1 + 6
, counter == 0
, z++
, …)x = y << 1 + 6;
, if (counter == 0) { ... }
, z++;
, …)int x;
, void foo(int y) { ... }
, …)foo
in the expression foo(5)
, …)mystruct
in any expression or declaration, …)assert x == 0;
, ensures \initialized(\result);
, …)Left-clicking on an element in this panel will select it and will be hilited in blue.
If the selected element is an expression, then it will be referred as the current expression (also called current term in some widgets).
At the same time, the selected elements wrapping statement will also be hilited with a rectangular blue mark at the left of the interactive code, near the left-margin. This statement will be referred as the current program point.
Note that some elements have no wrapping statement (such as declaration or some types). In this case:
ensures
clause)The concept of current expression and current program point are often used in widgets on the left side of the GUI (such as Function Context).
Caution
The current expression and/or the current program point can be automatically changed by performing actions in widgets.
The margin at the left of the interactive code holds bullets that represent the status of the ACSL properties.
The semantics of these bullets is detailed in the Properties section.
The colored vertical line on the left-hand side indicates statements where precision was lost or came close to being lost during the value analysis.
When the vertical line is yellow, no precision was lost but more than
X%
of the available slevel was consumed, where X
is a
percentage set in the Options and is set to 50%
by default. In this case, the tooltip text shows Slevel counter:
<used>/<slevel>
where <used>
is the count of separated states
used and <slevel>
is the slevel.
When the vertical line is red, precision was lost because some
separated states were joined. In this case, the tooltip text shows
<reasons>: <used>(+<widened>)/<slevel>
where <reasons>
are the
reasons why states were joined, and <widened>
is the number of
widening states used.
See About Statements for more information about the reasons why some states may be joined, or about the other elements in the tooltip.
The panel on the right side of the GUI shows the original source files.
Clicking on a line in this panel will change the current expression and current program point in the Interactive Code Panel to match with the associated line of the source code.
The associated line in the interactive code may not be shown for the following constructs:
When the cursor is in a source file, some shortcuts are available:
Tip
If a selected line in the source code has several different locations in the interactive code (for instance with C++ templates), a window will appear with these different locations and will jump to the location once it’s clicked.
The edition of files is allowed only on Edition mode which can be enabled with the button in the title of the Source Code Panel.
Once the mode is enabled, files can be edited, restored to their original version (button Restore File) or saved (button Save File). When a file is saved, the old content is overwritten and the operation cannot be undone.
We highly recommend to use an external Source Code Management framework in order to be able to revert these changes.
Caution
The Analyzer must be restarted to take into account the changes done in these files.
Two warning messages can appear above a source file:
This central panel displays the files located in the analyzer’s working directory and sub-directories and their status. It also allows to select (or deselect) files to parse with the analyzer.
Tip
Warning
Actions performed in this panel (such as removing a file) will act on the file system where the analyzer runs and cannot be cancelled.
Warning
It is not recommended to run the analyzer from /
, your
$HOME
, or any valuable directories. Keep your files safe with
a version control system.
A widget is a panel which bring new features in the GUI.
By right-clicking on the widget title, each widget can be moved to the bottom or left side of the GUI and also be in a floating window, called “detach” mode.
The widget list can be found in the Workspace in the Menu and allows to show or hide widgets. Note that some widgets are displayed automatically.
Here is a list of available widgets in the GUI (experimental widgets may not appear in this list):
This widget allows to change more settings than the Analysis Settings widget for the Value Analysis.
Each setting has its own sub-panel, accessible by selecting its tab on the left of the widget.
This panel allows to add, change or remove a slevel limit for a given function. It has the same behavior as the -slevel-function option.
This panel allows to enable or disable the default merge strategy for all functions and to enable or disable the merge strategy for a subset of functions. It has the same behavior as the -val-slevel-merge-after-loop option.
This panel allows to change the global split strategy (applied by default to all functions) and to add, change or remove a split strategy for a given function. It has the same behavior as the -val-split-return and -val-split-return-function options.
This widget gathers more statistics about the Value Analysis than the Analysis Statistics widget.
It displays:
Note
These statistics are updated during the computation of the Value Analysis.
This widget gathers results about the performed analysis, allowing the user to make a precise review of its analysis.
Most of the reported problems, warnings or information shown in the Overview can be found in this widget with more details.
Note
This widget is not shown if there are no problems, warnings, or information to be displayed.
This widget uses a sub-panel for each category of reported problems, warnings or information.
Each sub-panel has its own help on its associated category. Some of the sub-panels also have hints and useful links to the documentation that can help the user to do the review of the associated category.
Also, the user can add a comment for each item in these sub-panels
(with the icon ) and mark an item as reviewed (with
the icon
), allowing him to track his
progress in the review. Note: only warnings and information items can
be marked as reviewed; problems, such as the
Potential undefined
behaviors
, cannot be ignored and should be resolved.
All the detailed information of the review, including user comments and marks of his review progress, can be exported and downloaded with the Export Analysis Results widget.
This sub-panel displays more details about each function definition conflict. This information contains the function name, the location of the function used by the analyzer (kept function), the location of the function dropped by the analyzer (discarded function).
The Compare button opens a window that displays the two function definitions bodies (kept and discarded) side by side.
This widget allows to change the most common settings for tuning the Value Analysis. The settings are:
Entry point: the entry point to use for the analysis. See also the -main option.
Global slevel: the global value of the -slevel, used for all functions/statements if no other slevel directive is given for the function/statement.
Enable interpreter mode: when checked, the analyzer settings are updated to run the next Value Analysis in interpreter mode. When this option is checked, other settings may be overwritten. See also the interpreter profile of the -val-profile option.
Malloc plevel: the value of the -val-malloc-plevel option.
Stop analysis after finding X alarms: force the Value Analysis to abort after finding X alarms.
Note:
So the number given to this settings is not an exact count of alarms, but rather an heuristic to abort the analysis.
Stop analysis at first garbled mix of addresses: force the Value Analysis to abort after finding a garbled mix of addresses.
Save results by slevel: keep more precise results of the Value Analysis. It allows to have more precise evaluation of values in widgets such as the Values one. However, this setting also increases the memory usage.
This widget gathers simple statistics with charts about the last computation of the Value Analysis, such as:
Note
These statistics are updated during the computation of the Value Analysis.
Warning
This widget belongs to the Experimental features. The documentation about this widget may be incomplete or out-of-date.
The described feature below may still contain bugs. We do not recommend the usage of the Experimental mode for operational usage.
This widget offers an alternative view of the callstacks of the current function as it can be found in the Callstack or Callgraph widgets.
This widget displays the callstacks of the current function as a tree. When the current function changes, the previous callstacks are still displayed among the callstacks of the new current function and the callstacks of the new current function are highlighted.
Also, all call sites of a same function are displayed in the same order as the one displayed in the Interactive Code Panel.
By default, only the callstacks of the visited function are
displayed. However, if among the displayed call sites, a function has
more call sites than the one displayed in the widget, then a ...
placeholder is displayed, which shows the hidden call sites when
clicked upon.
This panel shows up when selecting the Callgraph tab.
A main purpose of the Callgraph widget is to provide an easy way to
navigate through big-sized callgraph. To achieve this goal, only a few
nodes are displayed in the first place and discovery of the graph is
done interactively. A contextual menu is accessible when
double-clicking on a node. This menu gives access to navigation
features: folding and unfolding the children/parents of the selected
node. A Show source code
button is also available in the menu for
opening the source code of the function in the usual source code panel.
Starting to use the Callgraph widget is as simple as choosing a root (function name) and pressing the Compute & Show button. You can then start to navigate through the graph using the contextual menu. The default callgraph mode is Value Analysis. In addition, several other modes exist and are detailed below.
The Properties widget gives information about the selected
node, and each property can be used as a selector to color the graph.
Clicking on arrow icon on the right side of a property colors
currently displayed nodes, which makes it easy to find out all nodes
sharing the same value for this property. For instance, in the
screenshot shown above, the selection has been made on the reached
by value
property.
Below node properties, you will find some useful actions:
Find a node:
Fill the find a node
field with the function name you are
looking for and press Enter. This is particularly useful when
the displayed graph contains more nodes than what you can deal with
manually.
Reset zoom:
Clicking this button resize the graph to make it fit in the display area.
Freeze nodes:
Usually the nodes are moving until the whole graph comes to a stable configuration. In some cases, it may be useful to freeze it beforehand. To do this, you can toggle the Freeze nodes button.
Hierarchical layout
Toggle this button allows you to switch between the physics-law
layout and the hierarchical one. If the Vert.
checkbox is
checked when pressing the Hierarchical layout button, the graph
is displayed in a top-to-bottom direction. Otherwise, a
left-to-right direction is used.
Value analysis (default mode)
Choose a root node by typing a function name in the root1, root2, ..., rootn
field.
Then press the Compute & Show button to display the root node(s) of the graph.
All paths from A to B
This mode provides a view of all possible paths from a given function to another.
Choose the start and destination nodes of the paths by filling the corresponding fields.
The field root1, root2, ..., rootn
must contain at least one ancestor of the start and destination nodes.
Then press the Compute & Show button to produce the graph together with a list of paths.
Select a path in the list to highlight it in the graph.
All paths from A to B in interactive mode
In cases where the list of paths is huge, you can use the Interactive mode
by selecting the so-named checkbox before pressing the Compute & Show button.
You will end up with only one path displayed on a comb-shaped graph.
Each possible branching to another path is represented by an edge to a null-outdegree node.
Clicking on such edges will give you access to an easy navigation through all possible paths.
Source files
In this mode, functions defined in the same file are grouped in a single node whose name is the file name.
The checkbox Only directories
allows you to group by directories instead of files.
The checkbox Download
indicates that you want to download the list of files (resp. directories) as a text file.
This widget shows up information about the available callstacks of the current function shown in the Interactive Code Panel.
This widget also introduces the concept of focused callstack: by clicking on a callstack in the grid, the selected callstack will become the focused callstack.
All widgets that support the View per Callstack feature (included the Interactive Code Panel) are called callstack-wise widgets. These callstack-wise widgets will filter their data according to this focused callstack. See the View per Callstack section for more details.
A focused callstack can be unfocused with the button Unfocus callstack.
Moreover, since the list of callstacks changes depending on the current function, a callstack can be pinned by clicking on the grey star icon in order to keep it in the list.
The widget also shows up two more information:
Tip
If the field Data filtered is Yes
, it is highly
recommended to be careful about the interpretation of the results
This widget allows to export analysis results of the Analysis review widget (including the user comments and manual review statuses) in two different ways:
-tis-review
. All user comments and manual review statuses will be
imported in the new analysis if their attached object (statement,
function, properties, …) still exists in the new analysis. Every
imported data that cannot be attached to an object in the new analysis
are displayed in the Review Invalid data widget.This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers information about the source files of the analyzed program.
The information shown in the columns is provided by the
-info-files
option of the Tis-info plug-in,
so you can refer to the About Files
section of its manual to get more information about them.
This widget is a grid: see Grids and filters for general information about how to use it.
This widget is quite similar to the Occurrence action in the Right-click menu except only global variables are accepted and a program point is not required.
The grid gathers all the occurrence in reachable statement of the global variable given in the toolbar’s Find global variable field.
This widget shows up information about the current function seen in the Interactive Code Panel.
Information above the grid are data of the Functions widget for the current function.
This widget also allows to update some of the Value Analysis options for the current selected function without impacting the other functions.
And the grid contains the list of callees and callers of the current function.
For the status column:
Note
The status can change according to the focused callstack (see View per Callstack)
The results shows all the different callsites in the case of a function that calls another one several times. The different callsites can be hidden with the filter button Hide callsites and will only show one line per function.
The list of callers/callees of the function context can be pinned by
clicking on and a copy of this list will be kept even
if the current function is changed.
A pinned function context can be unpinned by clicking on
This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers information about all functions in the analyzed program, including the standard library and the analyzer built-ins.
The information shown in the first columns is provided by the
-info-functions
option of the Tis-info plug-in,
so you can refer to the About Functions
section of its manual to get more information about them.
The last columns show statistics about the analysis of these functions.
When sorting by Self time, you can find out in which functions the analysis spent time. There can be at least two reasons:
memexec
does barely hit,If the reason is the second one, let’s go to the Statements.
This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers all program’s Globals.
A global is either a global variable, a type declaration or definition
(typedef
, struct
, union
, enum
), a function declaration
or definition, a global annotation or also a piece of assembly code.
This widget allows to run a grep command on files of the analyzer’s File System Panel.
The widget allows the usage of 4 different grep modes: - the Basic Regexp mode - the Extended Regexp mode - the Perl Regexp mode - the Fixed strings mode
The search is by default case sensitive. The box Case insensitive can be checked to change this behavior.
The grep command will search by default in source files only. This behavior can be changed by unchecking the box Only in source files.
This widget eases the use of J³.
This widget allows to both upload and download source files to (resp. from) the file system displayed in the File System Panel panel.
The button Upload sources asks the user to select files from his computer directly at the root of the analyzer file system.
Archives, such as .tar
, .tar.gz
and .zip
files, will be
automatically extracted at the root of the analyzer file system.
Also, all uploaded files can be read and modified in the analyzer, but they will lose their execute permission flag if it was present on the original file.
The button Download sources allows to download all the content of the analyzer file system, including all hidden files.
This widget displays the detailed error message of the last attempt of linking all parsed files.
Note
This widget is not shown if there is no error to display.
This widget gathers the current analysis state and results and allows to quickly access to the two main actions: parsing files and running the Value Analysis.
When the Parsing (resp. the Value Analysis) button is highlighted (underlined with an orange color), it means the parsing (resp. the Value Analysis) is out-of-date: a file or settings have been changed since the last run. The parsing (resp. the Value Analysis) should be performed again, otherwise data displayed in some widgets (such as the Interactive Code Panel or Values) may not be correctly synchronized with the current settings of the analyzer.
The button allows to set optional parameters while
parsing or running the Value Analysis.
This widget also allows to quickly set (or unset) a configuration file to load before running the parsing operation.
Then, the global status of the analysis and its detailed results (when available) are displayed.
The possible analysis status are:
files
field.The detailed results can contain the following items:
the warnings in the configuration file: this item notifies about warnings encountered when the current configuration file has been loaded. By clicking on this item, the list of warnings will be displayed in a new window. It is recommended to take care about these warnings first because they can lead to a unexpected analysis setup.
This item is only shown when the loaded configuration file has warnings.
the errors in the configuration file: this item notifies about errors encountered when the current configuration file has been loaded. By clicking on this item, the list of errors will be displayed in a new window. Files cannot be parsed until all errors of the configuration to load are resolved.
This item is only shown when the selected configuration file cannot be loaded.
the parsing results:
the link results: this item displays whether the files have been successfully linked. In the failure case, the item can be clicked to show the Linking Result widget to display the detailed error message.
This item is only shown when the link fails or when the global status is Compilation succeeded: if the Value Analysis has been run, it implies the link has been successful.
the entry point information: this item is displayed if:
the potential undefined behaviors found: this item displays the number of potential undefined behaviors found during the Value Analysis. If this number is not zero, clicking on this item allows to jump in the Interactive Code Panel to the first potential undefined behavior found.
This item is shown only if the Value Analysis is successfully computed.
the reason of the prematurely stop of the analysis: this item displays why the Value Analysis has been stopped prematurely. For instance, the Value Analysis can be stopped prematurely:
-val-stop-at-nth-alarm
, …This item is shown only if the Value Analysis is successfully computed.
the list of unknown called functions: this item displays the number of unknown called functions encountered during the Value Analysis. Clicking on this item allows to jump in the Interactive Code Panel to one of these unknown called functions. Calling an unknown called function can lead to severe imprecision in the analysis.
This item is shown only if the Value Analysis is successfully computed and if the number is not zero.
the list of non-terminating statements: this item displays the list of non-terminating statements encountered during the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
This item is shown only if the Value Analysis is successfully computed, without any potential behavior found and if the number is not zero.
the list of unproven properties: this item displays the list of unproven properties after computing the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
This item is shown only if the Value Analysis is successfully computed, without any potential behavior found.
the list of undefined functions with a specification: this item displays the list of undefined function with a specification used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
The specification of such functions should be carefully written.
This item is shown only if the Value Analysis is successfully computed and if the number is not zero.
the list of defined functions where specification are used: this item displays the list of defined function with a specification used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
The body of the function should be used instead of the specification unless the specification is used for time performance issues. In that case, the specification of such functions should be carefully written.
This item is shown only if the Value Analysis is successfully computed and if the number is not zero.
the list of used stdlib functions: this item displays the list of used stdlib functions encountered during the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed and if the number is not zero.
the list of unused parsed files: this item displays the list of parsed files that have not been used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.
Most of the time, these files can be removed from the list of files to parse without modifying the behavior or results of the Value Analysis.
This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed and if the number is not zero.
the coverage: this item displays the function and statement coverage of the Value Analysis.
This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed.
the target architecture used: this item displays the Target Architecture used for this analysis. This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed.
Function definition conflicts warning: This warning item displays the total number of function definition conflicts that exist in the source code. More details about this item are shown by clicking on this warning message or opening the Analysis review.
Function definition conflict warnings occurred when the program has multiple functions definitions with the same name and different bodies.
Function definition duplications warning: This warning item displays the total number of function definition duplications that exist in the source code. More details about this item are shown by clicking on this warning message or opening the Analysis review.
Function definition duplicated warnings occurred when there are multiple functions definitions with the same name and bodies.
This widget displays the detailed parsing error message of the last attempt of the parsing operation.
It also displays the command line used to parse the file.
Note
This widget is not shown if there is no error to display.
This widget allows to add, change or remove compilation options for parsing every selected file.
Only the following relevant compilation options are kept: -I
,
-D
and -U
. All other options will be discarded.
This widget is similar to the compilation_cmd
field of the
Configuration files.
Note
Currently, it is not possible to give a set of compilation options for a single file.
Warning
This widget belongs to the Experimental features. The documentation about this widget may be incomplete or out-of-date.
The described feature below may still contain bugs. We do not recommend the usage of the Experimental mode for operational usage.
This widget is related to Whole Program Graph computed during the Value Analysis to keep the finest possible information about all possible execution paths in the program.
This feature can be activated only in Experimental mode, with the options
-whole-program-graph
and -graphdeps
.
The Show Defs action in the Interactive code panel’s Right-click menu will use this Whole Program Graph feature when possible (only in Experimental mode), which is more precise than the old computation way.
Also, some button actions under alarms (such as the Uninitialized
alarms) in the Interactive Code Panel can also use the Whole
Program Graph when possible.
This widget shows up the current TrustInSoft Analyzer internal project and also allows to change the current project to another existing project.
Some TrustInSoft Analyzer features can create project, such as exporting a slicing request in Slicing or using the Mthread plug-in.
This widget is a grid: see Grids and filters for general information about how to use it.
The information shown in the columns
is provided by the -info-properties
option of the Tis-info plug-in,
so you can refer to the About Properties
section of its manual to get more information about them.
The status column rather shows a colored bullet instead of the status name.
The correspondence is:
never_tried
: no status is available
for the propertyconsidered_valid
: there is no
possible way to prove the property (e.g., the post-condition of an
external function). We assume this property will be validated by
external means.surely_valid
: the status is True
and the dependencies have the consolidated status surely_valid
or considered_valid
.unknown
: the status is Maybe
.surely_invalid
: the status is
False
, and the dependencies the consolidated have status
surely_valid
.valid_under_hyp
: the local status
is True
but at least one of the dependencies has a consolidated
status unknown
. This is typical of proofs in progress.invalid_under_hyp
: when the
local status is False
, but at least one of the dependencies has
status unknown
. This is a sign of a dead code property, or of an
erroneous annotation.valid_but_dead
: the status is
True
, but the property is in a dead or incoherent branch.invalid_but_dead
: the local
status is False
, but the property is in a dead or incoherent
branch.unknown_but_dead
: the local
status is Maybe
, but the property is in a dead or incoherent
branch.inconsistent
: there exists two
conflicting consolidated statuses for the same property, for
instance with values surely_valid
and surely_invalid
. This
case may also arise when an invalid cyclic proof is detected. This
is symptomatic of an incoherent axiomatization.Notice the Emission rank columns that is very useful when studying the alarm. First of all, its filter button enables to show Only alarms. Moreover, one can sort the alarms using the order of detection during the analysis, and as mentioned in the Study the Alarms section of the user guide, it is a good idea to explore them in this order. The meaning of the generated alarms are detailed in Value Analysis Alarms.
This widget gathers all user comments and manual review statuses set in the Analysis review widget that can no longer be attached to an object (function, statement, properties, …) in the current analysis.
Most of the time, the source code of the program may have been slightly changed and the analyzer did not successfully find the new associated object for a user comment for example. In that case, the user comment will be kept in this widget to allow the user to copy/paste the user comment in the new associated line in the Analysis review widget.
This widget is shown automatically when a slicing action has been done from the Right-click menu in the Interactive Code Panel
This widget allows to switch between different slicing request, to export them into a project (available from the Menu) and to download the sliced program.
This widget gathers information about the statement of the current program point.
All data above the grid are information extracted from the Statements widget for the current statement.
The grid shows up previous and next statements of the current statement selected in the Interactive Code Panel.
A statement can be added to the tracked statements with the
button. A tracked statement will be kept in another
grid and will not be removed when another statement is selected. A
tracked statement can be removed from the tracked statement grid with
the
button.
The Kind column can be:
Prev
for the previous statement of the current selected
statement in the Interactive Code PanelNext
for the next statement (like Prev
)Occurrence
for the result of the Occurrence action
in the Right-click menuDefs
for the result of the Show Defs action in the
Right-click menuImpactif
for the result of the Show impacting conditionnals action in the
Right-click menuNote
Each list of statements resulting of actions like Occurrence are kept in another grid but the tracking feature has the same behavior
The Info column can take several values depending of the Kind column:
Occurrence
kind:Read
when the targeted left-value is read in this statementWritten
when the targeted left-value is written in this statementDefs
kind:Direct
when the statement modify the targeted left-valueIndirect
when the statement is a call of a function where the
targeted left-value is modifiedClicking on a statement in this grid will send you to the selected statement in the Interactive Code Panel.
This widget is a grid: see Grids and filters for general information about how to use it.
The information shown in the first columns
is provided by the -info-statements
option of the Tis-info plug-in,
so you can refer to the About Statements
section of its manual to get more information about them.
To investigate when the analysis spent time, begin with the Functions. If more information is needed, filter the problematic function using the Function column and sort by Self time (this column may be hidden by default). Again there is at least two reasons for a statement to take time:
This widget displays all messages that should have been printed in
stdout (with functions such as printf
or fprintf
) if the
analyzed program was executed.
Note
This widget is not shown if there is no messages to display.
This widget displays all messages that should have been printed in
stderr (with functions such as fprintf
) if the analyzed program
was executed.
Note
This widget is not shown if there is no messages to display.
This widget allows to change the value of the Target Architecture,
also known as the -machdep
option. Once changed, files should be
parsed again to take this change into account.
This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers information about all types in the analyzed program.
The information shown in the first columns is provided by the
-info-types
option of the Tis-info plug-in,
This widget gathers the list of unresolved array lengths detected in the analyzed program.
The unresolved array length is an error that can happen when an array is defined with an imprecise length.
The corresponding error message is shown in the Overview widget.
By clicking on the error message, the unresolved array lengths widget is displayed with the list of all unresolved array lengths detected in the program.
Using such patterns is not recommended since related undefined behaviors are not detected by TrustInSoft Analyzer.
This panel’s feature is only available if the -value-messages
option
is given to the analyzer.
This widget gathers all messages emitted by the Value Analysis. Some
messages are emitted several times (for example: an alarm in a loop).
Duplicated messages can be shown by clicking on the filter button of
the column Emission Rank
.
The Severity
gives an idea of the message’s importance:
0
for an alarm with a status Valid
in this state2
for an alarm with a status Invalid
in this state1
for an alarm with a status Unknown
and for any other messagesNote
A same alarm can have several status in different states. The status seen in the Interactive Code Panel and Properties is the consolidate status of all the states where the alarm is raised.
For example, in a loop, an alarm can be Unknown
in the first
step and then always Valid
. Its consolidate status is
Unknown
.
Detailed value message
panel¶The Detailed value message
panel can be shown by clicking on a
button More actions in the last unnamed column.
This panel is quite similar to the Values widget and allows to evaluate several terms in the associated state of the selected message.
Clicking on another More actions button will change the
targeted message of the Detailed value message
panel. So the
associated state where terms are evaluated is also changed while the
list of terms remain intact.
When a value analysis has been done, this panel is very useful to explore the values that have been computed.
By default, it displays the current selected expression in the Interactive Code Panel panel with its computed values before and after the selected statement. Moreover, more expressions can be added in this grid to help comparisons of values of several expressions.
Expressions can be tracked by:
Some columns of this grid are automatically hidden/shown according to the selected Precision in the toolbar.
Tip
If a grid cell is too small to fully display its text, you can click on that cell (anywhere but on a link) to expand/collapse the full text.
The list of columns are:
The (resp.
) icon: when clicked,
the associated expression is tracked (resp. untracked).
The icon: when clicked, changes the
focused callstack to the one of the associated line (see also
View per Callstack).
Flags
to display icons to highlight distinctive
characteristics of the expression, such as the volatile
characteristic for a variable.
Expression
.
Emission rank
: the emission rank for a same expression is the
order of evaluation of their values during the Value analysis,
regardless of the callstack.
Callstack
to display the callstack of the associated expression.
Note the gray color for a callstack indicates it is the same one as the callstack of the line above.
Call rank
: a call rank of N indicates that this evaluation
is the Nth call of the same callstack. This can happen when a
function call is inside a loop.
Slevel rank
: a slevel rank of N indicates that the Value
analysis has visited the statement of the selected expression for
the Nth time for the same call rank and the same callstack.
Is widening
indicator to know if the associated evaluated
values may have been over-approximated due to a lack of slevel for
the selected statement or function.
Before
value: the evaluation of the expression before the
selected statement.
Note: The red background indicates the evaluation may have failed.
Is equal
indicator to easily check if the before and after
values are equal
After
value: the evaluation of the expression after the selected
statement, if available.
Note: The red background indicates the evaluation may have failed.
The precision allows to display the evaluated values of expression in different ways. The precision can be changed in the widget toolbar with buttons next to the Values label:
All displays the consolidated value of expressions (i.e. the grid contains one line per expression). This precision can also introduce over-approximations since all states of the Value analysis are joined.
Per Callstacks displays values, one line per available callstacks for each expression. It is more precise than the All precision.
Per Call displays values, one line per function call of the current selected statement, for each expression. It is more precise than the Per Callstacks precision if functions of the callstack are called inside a loop.
This precision is available only if results by slevel are saved
during the Value analysis with the option -val-slevel-results
.
Per Path displays values, one line per path (i.e. per available state in the Value analysis) for each expression. It is the most precise way to display values.
This precision is available only if results by slevel are saved
during the Value analysis with the option -val-slevel-results
.
For precision other than the All one, when several expressions are displayed, comparing values on different lines can be tiresome.
There are two different ways to group lines of the grid, available next to the Group by label in the toolbar:
Then, lines can be sorted by clicking on the Emission rank column. Note that the current expression is always displayed at first, only tracked expressions are sorted.
This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers information about all variables in the analyzed program.
The information shown in the first columns is provided by the
-info-variables
option of the Tis-info plug-in,
so you can refer to the About Variables section of its manual
to get more information about them.
This widget allows the usage of the tis-mkfs tool from the Graphical User Interface.
The Virtual File System allows to simulate a File System for the analyzed program during the Value Analysis. The architecture of this virtual File System is displayed in the right panel of this widget.
However, the Virtual File System needs to be generated into a C file in order to be used by the Value Analysis. Every time the Virtual File System is changed, the Generate file system button should be used to generate again the associated C file.
Note
Once the C file for the Virtual File System is generated, the parsing operation should be run again
In order to use the Virtual File System, the generation of the
associated C file is not the only requirement: the File
system: Enabled button, located at the top-right corner of the
widget, should be toggled to allow the analyzer to correctly use the
generated C file. This is the same behavior to give the option -fs
on the command line.
Also, by default, the No Error Mode is disabled. When this mode is changed, files should be reparsed to correctly use this mode.
All quoted options in this section refers to the tis-mkfs
ones
explained in this section.
Files or directories can be added to the file system by:
-local
.-add-dir
.Tip
The option -generic
is not yet supported by this widget.
Note
Adding a directory from the File System Panel will prevent to modify (move/rename/delete) its content.
Then, every added file or directory can be either renamed or deleted by right-clicking on it in the Virtual File System.
File and directories can also be moved to another directory by drag’n’drop.
The Save configuration button allows to export the current built Virtual File System to be able to build it again automatically.
The two suggested solutions are:
filesystem
field of Configuration files with
the generated value.tis-mkfs
tool with the generated command
line.This widget is a grid: see Grids and filters for general information about how to use it.
This widget gathers all goals computed by the WP Plugin with their status.
For more information about the WP Plugin, see the Prove Annotations with WP section.
Qed
and Alt-Ergo
are two provers. The WP Plugin tries first
with Qed
and if it is not a success, it will try with Alt-Ergo
.
The WP Plugin can be configured with options on the command line (see
tis-analyzer -wp-help
). Three of these options can be modified
directly in the GUI in the widget’s toolbar:
-wp-timeout
)-wp-split
)-wp-model
)The icon in the last column allows to display the
whole goal and proof obligation of the selected goal.
Note
All goals persist in the grid even after computing another goal. Click on the Clear all goals button in the toolbar in order to clear the grid.
Many widgets of the GUI are composed of configurable columns. This sections gives some help about how to use them.
First of all, when many columns are available, one can choose which one is shown or not. The menu to select them is visible at the right of each column name.
In the same menu than the columns selection, one can choose to sort the list according to the content of the column. When the list is sorted, a small arrow appear near the column title.
Filters can be used to select a subset of the available lines. An active filter is highlighted with a red border as a reminder that some lines are hidden according to that criteria. Moreover, the number of shown lines over the total number is also printed.
Filters in several columns can be used at the same time. All the filters can be reset with the Reset filters button. A combination of filters can be saved with the Save this filter button and recalled with the Load a filter button.
Some of the columns have predefined possible values. It is the case for the Kind columns for instance. The filter then propose the list of the possible values that can be selected or not.
For the columns showing names (file, functions, etc), the filter is a Search field. The input string is interpreted as a regular expression.
The following constructs are recognized:
.
Matches any character except newline*
(postfix) Matches the preceding expression zero, one or several times+
(postfix) Matches the preceding expression one or several times?
(postfix) Matches the preceding expression once or not at all[..]
Character set. Ranges are denoted with -
, as in
[a-z]
. An initial ^
, as in [^0-9]
, complements the
set. To include a ]
character in a set, make it the first
character of the set. To include a -
character in a set, make it
the first or the last character of the set.^
Matches at beginning of line (either at the beginning of the
matched string, or just after a newline character).$
Matches at end of line (either at the end of the matched
string, or just before a newline character).|
(infix) Alternative between two expressions.\b
Matches word boundaries.\
Quotes special characters. The special characters are $^\.*+?[]
.The button at the left side of the input string allows to choose if the filter have to match or mismatch the given regular expression.
The information showed in the grid can be saved in a CSV file thanks to the Download button. Only the filtered data are downloaded, then in order to download all data in the grid, the Reset filters button should be used before to press the Download button.
Note
CSV separators can be customized with the Options panel in the Menu
By default, once the Value Analysis is computed, the GUI offers a view on the consolidate state of all execution paths in the program. Some widgets, such as Values, can show more precise data by using the per callstack states of the Value Analysis.
The list of available callstacks of the current function is shown with the Callstack widget.
It is also possible to focus on one of these callstacks in order to have a view only on data relative to this callstack. Focusing on a callstack can be convenient for an investigation when an alarm occurs only in one callstack in a function called several times in different execution paths.
Note
Some widgets, such as the Functions widget, are not impacted by the focused callstack. The list below indicates which widget support the view per callstack feature.
Most of the time, the widget’s documentation notifies when the widget support the feature and/or a graphical notification (icon, text, …) indicates if the view per callstack filters data in the widget.
You can focus on a callstack:
Caution
Once a callstack is focused, some data in the current function of the Interactive Code Panel can be hidden by this focused callstack (as well as for all widgets supporting the view per callstack feature).
For instance, going to a function that is not related at all to the focused callstack will display the function body as a dead code.
Once this feature is used, it’s highly recommended to have the Callstack widget in the range of sight: it shows up if some data are filtered or not by the focused callstack.
A callstack can be unfocused in the Callstack widget.
In order to navigate easier in the code with a focused callstack, some shortcuts have been added:
Clicking on a callsite in the Focused callstack (in the Callstack widget) will change the current program point to the selected callsite and also reduce the focused callstack.
For example, with g@24 <- f@12 <- main@5
as the focused callstack,
clicking on f@12
will change the program point to the statement of
the function f
where g
is called for this callstack. And this
will also reduce the callstack to main@5
.
In the Interactive Code Panel, when the current function is the function called by the left-most callsite of the focused callstack two new items appears in the Right-click menu:
For example, let’s have f@12 <- main@5
as the focused callstack,
g
the function called at f@12
, and h
a function called by g
.
These two menu items appears only in the g
function.
Go to caller f & reduce focused callstack
will
reduce the focused callstack to main@5
and jump to the
function f
where g
is called.Go to definition of h & expand focused callstack
will
expand the callstack to g@24 <- f@12 <- main@5
and jump to the
definition of h
Any action that focus or change the focused callstack are registered in the history (see Backward and Forward buttons) and can be undo or redo easily.
Here is the list of widgets that support the view per callstack feature:
The impacting conditionnals feature syntactically determine an over-approximation on conditional guards that impact on a left-value among all possible paths at a given program point.
This feature helps the user find all conditionnals that can bring additional information about the selected left-value in order to determine if it has been over-approximated by the analyzer.
In the following example, an alarm is raised on the access of
msg[s1 + s2]
in spite of the conditional if (s1 + s2 >= LEN)
above.
1 2 3 4 5 6 7 8 9 10 11 12 | #define LEN 42
unsigned int total_len = LEN;
unsigned char msg[LEN];
int main(unsigned int s1, unsigned int s2)
{
if (s1 + s2 >= LEN)
return -1;
return msg[s1 + s2];
}
|
By selecting the left-value s1
where the problem occurs and by
computing the impacting conditionnals, the conditional if (s1 + s2
>= LEN)
is highlighted.
The current expression is the element currently selected in the Interactive Code Panel.
There is 6 kinds of clickable elements in the interactive code:
x
, y << 1 + 6
, counter == 0
, z++
, …)x = y << 1 + 6;
, if (counter == 0) { ... }
, z++;
, …)int x;
, void foo(int y) { ... }
, …)foo
in the expression foo(5)
, …)mystruct
in any expression or declaration, …)assert x == 0;
, ensures \initialized(\result);
, …)A global is either a global variable, a type declaration or
definition (typedef
, struct
, union
, enum
), a
function declaration or definition, a global annotation or
also a piece of assembly code in the program.
The list of globals is available in the Globals widget.
Note that each tab in the Interactive Code Panel is a global.
Caution
The tis-aggregate
tool is not available in this version.
The purpose of the tis-aggregate
tool is to collect and aggregate
several results generated by TrustInSoft Analyzer.
To know how to use TrustInSoft Analyzer, refer to TrustInSoft Analyzer Manual.
The Basic usage section below assumes
that you already know how to build tis-analyzer
command lines
to run analyses and get Value analysis results
as explained in Prepare the Analysis.
The next Advanced usage section presents more tis-aggregate
commands and options.
Let’s first give an example of using tis-aggregate
to get the coverage
from several value analysis results.
As explained in the Check the Coverage section of the TrustInSoft Analyzer Manual,
the coverage measures the percentage of the code that has been
reached by the value analyses, and it helps to detect dead code.
Let’s consider a small application composed of two source files and a header file:
convert.c
holding some library functions:¶#include "convert.h"
int convert_atoi (char * arg, uint * p) {
unsigned int x = 0;
for (char* p = arg; *p != '\0'; p++) {
char c = *p;
if ('0' <= c && c <= '9')
x = 10 * x + c - '0';
else
return 1;
}
*p = x;
return 0;
}
static int nb_digit (uint x, uint base) {
if (x == 0)
return 1;
else {
int nb;
for (nb = 0; x; x /= base) nb ++;
return nb;
}
}
char digit_name (int x) {
return x < 10 ? '0' + x : 'a' + x - 10;
}
char * cv_to_base (uint x, uint base, char * buf) {
if (2 <= base && base <= 16) {
int nb = nb_digit (x, base);
buf[nb--] = 0;
while (nb >= 0) {
int d = x%base;
char c = d <= 9 ? '0' + d : digit_name (d);
buf[nb--] = c;
x /= base;
}
}
else
return NULL;
return buf;
}
char * convert (char * str_x, char * str_base, char * buf) {
unsigned int x, base;
int err;
err = convert_atoi (str_x, &x);
if (err) return NULL;
err = convert_atoi (str_base, &base);
if (err) return NULL;
return cv_to_base (x, base, buf);
}
convert.h
that declares the library functions:¶#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
int convert_atoi (char * arg, uint * p);
char * cv_to_base (uint x, uint base, char * buf);
char * convert (char * str_x, char * str_base, char * buf);
main.c
holding the test entry point:¶
#include "convert.h"
int main (int argc, char * argv[]) {
if (argc != 3) {
printf ("Error: 2 arguments expects, got %d\n", argc);
exit (EXIT_FAILURE);
}
char buf[100];
char * x = argv[1];
char * base = argv[2];
char * p = convert (x, base, buf);
if (p) {
printf ("Result: convert %s in base %s = %s\n", x, base, p);
return EXIT_SUCCESS;
}
else {
printf ("Error: unable to convert %s in base %s\n", x, base);
return EXIT_FAILURE;
}
}
This application can for instance be analyzed with the following command line:
$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10'
It gives results for the analysis of the application as if it were run with the two arguments 1234 and 10. Of course, it has to be run for other values to get more analysis results.
The Information about the Coverage section explains that the coverage is computed
from csv files that are generated by tis-analyzer
when using
-info-csv-xxx
options. The simplest way to generate those files
is to just add the -info-csv-all option,
and a name for the results,
to the command line of each analysis.
For instance, to generate all the csv files in an existing results/
directory, the command line above becomes:
$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10' \
-info-csv-all results/test-1234-10
The argument of the option must be unique for each analysis
since it is used as a base name to store the analysis result files.
Those names are provided to tis-aggregate
through what is called
the INDEX
file. Each line of this file holds the base name
of one analysis results. The goal of the tis-aggregate
tool
is to collect the information from the analyses listed in this index file.
Before creating the index, let’s create another analysis with a different input:
$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 2' \
-info-csv-all results/test-1234-2
So, the index file to create for these two analyses above would looks like:
convert.aggreg.index
¶results/test-1234-10
results/test-1234-2
Now the coverage can be computed with:
$ tis-aggregate coverage convert.aggreg.index
It gives the following result:
tis-aggregate coverage convert.aggreg.index
¶File Function #reached #statements Ratio Comment
main.c __tis_globfini 2 2 100% total
convert.c convert 7 11 64% partial
convert.c convert_atoi 15 19 79% partial
convert.c cv_to_base 22 27 81% partial
convert.c digit_name 0 5 0% not reached
main.c main 10 15 67% partial
convert.c nb_digit 12 14 86% partial
TOTAL (functions) 6 7 86%
TOTAL (all statements) 68 93 73%
TOTAL (stmts in reachable functions) 68 88 77%
It shows the list of the functions with the number of reachable statements and the total number of statements.
The three last lines labelled TOTAL
summarize the coverage:
functions
: number of analyzed functions over the number of functions.statements
: number of reachable statements over the number of statements.stmts in reachable functions
: number of reachable statements
over the number of statements in the reachable functions.We can for instance see that the digit_name
function is not reached.
This is because it is only used for a base greater than 10.
So let’s run other analyses (with other inputs)
and add them into the index file to enhance the coverage:
convert2.aggreg.index
¶results/test-1234-10
results/test-1234-2
results/test-1234-16
results/test-0-10
results/test-a2b-10
The coverage becomes:
tis-aggregate coverage convert2.aggreg.index
¶File Function #reached #statements Ratio Comment
main.c __tis_globfini 2 2 100% total
convert.c convert 9 11 82% partial
convert.c convert_atoi 19 19 100% total
convert.c cv_to_base 23 27 85% partial
convert.c digit_name 4 5 80% partial
main.c main 13 15 87% partial
convert.c nb_digit 14 14 100% total
TOTAL (functions) 7 7 100%
TOTAL (all statements) 84 93 90%
TOTAL (stmts in reachable functions) 84 93 90%
Notice that since the function coverage is now 100%,
the statements
coverage and the in perimeter
one are the same.
To change the output format to csv (below) or html,
the --format
option can be used:
tis-aggregate coverage convert2.aggreg.index --format=csv
¶File,Function,#reached,#statements,Ratio,Comment
main.c,__tis_globfini,2,2,100%,total
convert.c,convert,9,11,82%,partial
convert.c,convert_atoi,19,19,100%,total
convert.c,cv_to_base,23,27,85%,partial
convert.c,digit_name,4,5,80%,partial
main.c,main,13,15,87%,partial
convert.c,nb_digit,14,14,100%,total
TOTAL (functions),,7,7,100%,
TOTAL (all statements),,84,93,90%,
TOTAL (stmts in reachable functions),,84,93,90%,
The csv format may be useful to be opened in a spreadsheet tool for easy selection of elements.
When using --format=html
, the coverage is represented
with a decomposition based on the file paths.
It makes it easier to browse the results.
The presentation is interactive
making it possible to fold/unfold the information.
The default is to show only the summary of the coverage
on the files and functions.
It is also possible to see the detailed coverage of each statement
by using the --html-stmts dir
option
provided the analysis was run before with:
$ tis-analyzer -aggreg-html-stmts dir [options]
It generates an HTML file for each function in the specified dir
directory.
So for instance, with these commands, the HTML detailed coverage
is available from the results/html-cov/index.html
file:
$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10' \
-info-csv-all results/test-1234-10 -aggreg-html-stmts results/html-cov
$ tis-aggregate coverage convert.aggreg.index \
--format=html --html-stmt results/html-cov
The -aggreg-html-stmts
option may be used several times:
previously existing files are overwritten to make sure the more recent version
is available.
Notice that it is not mandatory to use -aggreg-html-stmts
on each analysis, but the detailed coverage is then available
only on the functions that have been translated,
i.e. on those which have been seen in the analyses
where -aggreg-html-stmts
have been used.
Moreover, whatever the format is,
the output file may be specified with the --output-file
option
(except when using --html-stmts dir
where the output file
is automatically dir/index.html
).
To know more about the other features of the coverage
command,
read the Tis-aggregate coverage section below, or use:
$ tis-aggregate coverage --help
The tis-aggregate
tool also provides other commands.
The help
command shows the available commands and the list of the options
that are common to all of them:
$ tis-aggregate help
The following Advanced usage section presents some of them.
The Basic usage section above already explained
how to use the coverage
command, but some more details are given below.
To better understand the coverage,
the dead code (i.e. the unreachable statements) needs to be analyzed.
The easiest way to do that is to use the GUI,
where the dead statements are highlighted in red,
but it can be difficult to make a synthesis of the dead code over several
analyses. So tis-aggregate
can be used to generate
the list of the statements that are not reached by any analysis.
With the --statements
option, the list of the statements
is filtered to keep only the unreachable statements.
A path/base_statements.dead.xxxxxx.csv
file
(where xxxxxx is a random string) is generated.
It holds information about these statements
that are never reached by any analysis.
On the example below, it gives for instance:
tis-aggregate coverage convert2.aggreg.index --statements
¶File Function #reached #statements Ratio Comment
main.c __tis_globfini 2 2 100% total
convert.c convert 9 11 82% partial
convert.c convert_atoi 19 19 100% total
convert.c cv_to_base 23 27 85% partial
convert.c digit_name 4 5 80% partial
main.c main 13 15 87% partial
convert.c nb_digit 14 14 100% total
TOTAL (functions) 7 7 100%
TOTAL (all statements) 84 93 90%
TOTAL (stmts in reachable functions) 84 93 90%
File, Line, Function, Number
convert.c, 28, digit_name, 2
convert.c, 31, cv_to_base, 22
convert.c, 42, cv_to_base, 23
convert.c, 42, cv_to_base, 24
convert.c, 42, cv_to_base, 25
convert.c, 51, convert, 7
convert.c, 51, convert, 8
main.c, 27, main, 2
main.c, 28, main, 3
Some options are provided to decide which functions should appear in the coverage results.
For instance, the coverage of the libc functions is usually not interesting to
have, so the default is to ignore them, but they can be included
when the --libc
option is set.
The default behavior is to show the unreachable functions,
but they can be skipped by setting the --skip-unreachable-fun
option.
It is especially useful when using the --libc
option.
This is for instance the coverage computed on the first convert.aggreg.index
index file shown above, without the unreachable functions:
tis-aggregate coverage convert.aggreg.index --skip-unreachable-fun
¶File Function #reached #statements Ratio Comment
main.c __tis_globfini 2 2 100% total
convert.c convert 7 11 64% partial
convert.c convert_atoi 15 19 79% partial
convert.c cv_to_base 22 27 81% partial
main.c main 10 15 67% partial
convert.c nb_digit 12 14 86% partial
TOTAL (functions) 6 6 100%
TOTAL (all statements) 68 88 77%
TOTAL (stmts in reachable functions) 68 88 77%
Notice that the function coverage is now 100% since the unreachable functions are filtered out, but it might not always been the case when using the -val-use-spec option for instance.
The --skip-path
, --skip-file
and --skip-function
options
take a regular expression and specify the files or functions that should be
skipped in the results. For example:
tis-aggregate coverage convert.aggreg.index --skip-function main
¶File Function #reached #statements Ratio Comment
main.c __tis_globfini 2 2 100% total
convert.c convert 7 11 64% partial
convert.c convert_atoi 15 19 79% partial
convert.c cv_to_base 22 27 81% partial
convert.c digit_name 0 5 0% not reached
convert.c nb_digit 12 14 86% partial
TOTAL (functions) 5 6 83%
TOTAL (all statements) 58 78 74%
TOTAL (stmts in reachable functions) 58 73 79%
The --skip-file
option may be used to exclude
all the functions that come from some files. For instance:
--skip-file '^lib'
The --skip-path
option may be used to exclude
all the functions that come from some files in a specified directory,
and may also include the file name prefix. For instance:
--skip-path some/path/ --skip-path another/path/xxx_
Warning
Because the coverage rely on the statement number inside the function, it is relevant only for the functions that have exactly the same statements in all the analyses, which means that, if different preprocessing options are used, they must not change that function source code.
However, some analyses may hold only part of the source code: the statements of the functions that are not in some analyses results are considered as if they were present but dead.
Despite the warning above, if some functions really need a different code for some analyses, one should follow the advice presented below to nonetheless get usable coverage results.
When adding a conditional statement, a #else
part with the same number of
statements should be added so that the other statements of the function keep
the same number across the analyses. For instance:
#ifdef TEST_1
tis_variable_split (&x, sizeof(x), 10);
# else
;
#endif
<statement>
It is sometimes the case when two versions of the same function are needed for some analyses. In this case, instead of redefining the function like this:
#ifdef CONFIG_1
int f (...) {
<code 1>
}
#else
int f (...) {
<code 2>
}
#endif
It is better to define two functions to make them visible in all the analyses. The conditional code is only used to call one or the other:
int f_config_1 (...) {
<code 1>
}
int f_other_config (...) {
<code 2>
}
int f (void) {
#ifdef CONFIG_1
return f_config_1 (...);
#else
return f_other_config (...);
#endif
}
Optionally, the --justif
option can be used to provide a file
with some justifications for partially covered functions.
Each line of this file must be formatted as:
function_name, number of justified dead statements, justification
The free text in justification
can be anything,
but cannot contain a newline nor a comma
(in order to produce a well-formed CSV file).
This file is useful when doing the verification several times to check whether the number of unreachable statements has changed since the last time the results have been reviewed.
For instance, for the example presented above, the justification file may look like:
convert.justif
holding some dead code justification:¶main, 2, no test with wrong number of arguments.
convert, 2, no test with invalid char in second input.
digit_name, 1, always called with `x>=10`.
And then the coverage result becomes:
tis-aggregate coverage convert.aggreg.index --justif convert.justif
¶File Function #reached #statements Ratio #justified Ratio Comment
main.c __tis_globfini 2 2 100% 0 100% total
convert.c convert 9 11 82% 2 100% no test with invalid char in second input.
convert.c convert_atoi 19 19 100% 0 100% total
convert.c cv_to_base 23 27 85% 0 85% partial
convert.c digit_name 4 5 80% 1 100% always called with `x>=10`.
main.c main 13 15 87% 2 100% no test with wrong number of arguments.
convert.c nb_digit 14 14 100% 0 100% total
TOTAL (functions) 7 7 100% 0 100%
TOTAL (all statements) 84 93 90% 5 96%
TOTAL (stmts in reachable functions) 84 93 90% 5 96%
Beside the Prepare the Analysis that shows how to run tis-analyzer commands to get Value analysis results, the ACSL Properties section shows how to add some properties in the code to solve some problems, and how some of them can be proven using the WP plugin that involves theorem prover techniques (see Prove Annotations with WP).
As pointed out by the Conclusion of the TrustInSoft Analyzer Manual,
all the results obtained by the analyses rely on hypotheses,
and the code annotations are a large part of them.
The property-status
command helps to detect the annotations that are not
verified by computing the status of each property
across several analyses.
To get an ok
status, a property has:
We have seen in Basic usage that the index file
holds the list of the base names of the analyses results.
The property-status
command also needs to know where the results
come from. So an optional second column has to be added
to specify the kind of analysis that gave the results.
It can be value
, wp
or other
.
A missing kind is equivalent to other
.
The property-status
command only consider the value
and wp
results.
So the index file may look like:
path_1/proj_1
path_2/proj_2, value
path_2/proj_3, wp
...
path_n/proj_n
With the enhanced index file, the command may be run with:
$ tis-aggregate property-status appli.aggreg.index
To know more about the available property-status
options,
see the sections below or run:
$ tis-aggregate property-status --help
Some properties may be difficult to prove with the tools,
so the user may prefer to justify them manually.
The format of the file holding the intellectual review
is not constrained, but it has to be a text file
where the string [#justif-yyy:xxx]
specifies
that the property named xxx
from the yyy function has been justified.
The file is provided to the tool with the --reviews
option.
Optionally, if the user follows the recommendations given in
Naming the Annotations, i.e. takes care of adding a suffix to the property names,
the --check-suffix
option may be used to make
the tool check if the computed status match the given suffix.
The default suffixes are:
_wp
for properties that should be proved with WP
(use the --wp-suffix
to override it),_val
for properties that should be checked by the Value analysis
(use the --val-suffix
to override it),_rv
for properties that should be in the intellectual reviews
(use the --review-suffix
to override it),--ignore-suffix
option can be used to specify a suffix to ignore,
like _todo
for instance.The default behavior of this command is to print the list of the properties
that are not verified (according the criteria listed above).
However, with the --all
option,
all the properties will be listed with information about their verification
(and optionally information about the suffixes).
At the end of a study, it might be the case that some useless elements remain.
For instance, if a property is valid in every analysis,
it is useless to take some time proving it with WP.
And if a property is valid in every analysis,
or if it has been proved by WP,
it is useless to keep a manual justification for it.
When the --show-useless
option is set,
some warnings are generated for such useless elements
and may help to clean the final results.
As seen in the tis-info Manual, the .csv
files generated
when using the -info-csv-all option
can be used to explore the results of an analysis.
But if you are familiar with SQL, it might be easier to use it
to make more elaborated requests over several .csv
files.
The sql-export
command generate a sqlite database:
$ tis-aggregate sql-export appli.aggreg.index name
The appli.aggreg.index
file is the index file as usual,
and name
is one of the analysis of the index.
This second argument is needed because the database
is generated only for one analysis at the moment.
Use the --help
option to have for more information about this command,
and to see examples.
Caution
The tis-info
plug-in is not available in this version.
This plug-in extracts a lot of useful information from the current project.
The information about each element is printed on a single line comma separated
fields,
which means that the result can be saved as a .csv
file.
Theses files can be opened in a spreadsheet tool for easy selection.
Moreover, it makes it easy to select some of them with grep
.
And if you are familiar with SQL, you can even build
a sqlite database
to make more elaborate requests over several .csv
files
(see Tis-aggregate sql-export).
To generate all the .csv
files from a project,
use the -info-csv-all
option.
Some pieces of information are syntactic, and some others are semantic and are only available when the project holds value analysis results.
When the -info-functions
options is used, some information is printed
for each function of the current project.
The following list explains which information can be found in each column:
Function: function name.
Is defined: tells whether the function has a body or if it is an external function. The possible values are:
- defined: if the function has a defined body
- undefined: if the function is external
Is static: tells whether the function is static or has a global scope. The possible values are:
- static: if the function is static
- global: if the function has a global scope
Addr used: tells whether the address of the function is used. The possible values are:
- yes
- no
Is called: tells if the function is syntactically called somewhere in the application. Notice that the entry point (main function) is considered as called. The possible values are:
- called: if the function is syntactically called
- not called: otherwise
Is reachable: tells if the function is semantically reachable from the entry point. Notice that this information is only available if the project contains value analysis results. The possible values are:
- NA: if no value analysis results are available
- reachable: if the function has been reached during analysis
- unreachable: otherwise
Aliases: tells whether the function is aliased to other function declarations from the original source that may have disappeared from the merged file.
Clone origin: name of the source function if the function is a clone, ‘NA’ otherwise.
Spec: tells if the function has an ACSL specification. The possible values are:
- user: if the user gave a specification
- generated: if there is no user specification, but a specification has been generated by the tool
- none: otherwise
This information is mainly interesting for undefined functions.
val-use: tells how the function is considered by the analyses. The possible values are:
- built-in: if a built-in is used for this function, or if it is a built-in function
- specification: if the function is not defined or if -val-use-spec options is used to cut the analysis on this function
- body: otherwise
File: gives the file in which the function is defined (or declared).
Starting line: line of the file where the function begins. The values are integers
Ending line: line of the file where the function ends. The values are integers
#reached: tells how many statements of the function are semantically reachable. Notice that this information is only available if the project contains value analysis results to provide reachable information. The values are integers, where -1 means not available.
#total: gives the number of statements of the function. The values are integers
coverage: gives the coverage percentage if the reachable information is available. The value is
NA
otherwise.#called: represents how many times this function has been called during the analysis. The values are integers, where -1 means not available.
Overhead time: represents the time analyzing a function minus the time analyzing its statements or the associated built-in. The values are floating point numbers or 0.0 when not available.
Self time: represents the time analyzing a function minus the time analyzing its call statements. The values are floating point numbers or 0.0 when not available.
Total time: represents the time analyzing a function. The values are floating point numbers or 0.0 when not available.
Memexec time: time taken by memexec to check if we can reuse a previous call and to store a new call. The values are floating point numbers or 0.0 when not available.
Memexec hits: represents how many time previous analysis results of this function have been reused by memexec. The values are integers, where -1 means not available.
Is libc: tells whether the function is provided by the standard C library. The possible values are:
- libc:no
- libc:yes
Cyclomatic complexity: McCabe’s cyclomatic complexity The values are integers
The -info-csv-functions filename
options export the same information into the given filename.
When the -info-statements
options is used, some information is printed
for each statement of the current project.
The following list explains which information can be found in each column:
Sid: internal unique statement identifier.The values are integers
File: file where the statement is.
Line: line of the file where the statement is.The values are integers
Function: name of the enclosing function.
Number: statement number inside the function. This is the number used when printing REACHED information. The values are integers
Reachable: tells if the statement is reachable. Notice that this information is only available if the project contains value analysis results. The possible values are:
- NA: if no value analysis results are available
- reachable: if the statement has been reached during analysis
- unreachable: otherwise
- never-terminating: if the value analysis of this statement never terminated
- non-terminating: if the value analysis of this statement did not terminate for some paths
- unpropagated: if the results for that statement are incomplete due to a value analysis degeneration
- degenerated: if the value analysis has degenerated here
Dead code entry: flag the statements that are dead code entry points. A dead code entry point is a dead statement that, either it has no predecessor (it is the first statement of an unreachable function, or it is syntactically an unreachable statement) or it has at least one reachable predecessor. The possible values are:
- yes
- no
Notice that this information is only available if the project contains value analysis results, otherwise no statement is flagged.
Dead code exit: flag the statements that are dead code exit points. A dead code exit point is a dead statement that, either it has no successor (it is the last statement of an unreachable function for instance) or it has at least one reachable successor. The possible values are:
- yes
- no
Notice that this information is only available if the project contains value analysis results, otherwise no statement is flagged.
Dead Id: give a unique id per function for each contiguous statement/block not reachable by the Value Analysis (with an exception for values 0 and -1). If the Value Analysis results are not available, all dead ids are set to -1. It allows to unambiguously refer to pieces of dead code in functions. If the function is not reachable at all by the Value Analysis, then all dead ids of this function statements are 0. If a statement is reachable by the Value Analysis, its dead ids is -1.
Kind: kind of the statement. The possible values are:
- test: a test (if). See next column for more information
- function call: a function call. See next column for more information
- instruction: a simple instruction, such as an assignment for instance
- inline assembly: for assembly statements
- loop
- return
- jump: for statements like goto, break, continue
- switch
- block
- unspecified sequence: for a sequence of instructions that comes from the decomposition of an instruction of the original program, and for which the order of evaluation in unspecified.
- other statement
More: gives more information on the statement depending on its kind. The possible values are:
- (else branch dead | then branch dead | no reduction): for a test, when the reachability has been computed, indicates if one of the two branches is alive and the other one is dead. Moreover, if the -info-stmt-no-reduction option is used, and both branches are alive, and the results are available, indicates whether the states are the same in both branches, which means that the effect of the conditional has definitely been over-approximated. Nothing is printed otherwise.
- direct call: for a direct call
- indirect call: for an indirect call, and gives moreover:
- the list possibly indirectly called functions if available,
- NA otherwise (that is: indirect but not analyzed).
- loop n°%d in the function: loop number in the function. This information is useful when importing ACSL loop properties for instance
Join reason: represents why the analysis joined several propagated states. The possible values are:
- <none>: when no joins occurred on this statement
- merged loop: when a loop iteration analysis starts or ends with more than one propagated state (to disable this behavior, use
-val-slevel-merge-after-loop=-@all
)- reached slevel: when the slevel for this statement has been reached
- widened: when the analyzer started to over approximate a loop (should only happen when the slevel is reached for the loop)
Notice that several value may be used for one statement.
SlevelCounter: is the maximum slevel reached for this statement for all calls. In other words, it represents how many separated states went through this statement. This is an over-approximation and gets cut once it reached the slevel. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.
SlevelLimit: is the statement slevel limit. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.
Slevel ratio: is (counter * 100 / slevel). The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.
Widening: represents how many widening states went through this statement. There may be more than one widening state if we are in a loop and we do not find the fixpoint with the first widening iteration. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.
Self time: is the time spent analyzing this statement excluding potential calls. The values are floating point numbers or 0.0 when not available.
Self time: is the time spent analyzing this statement including potential calls. The values are floating point numbers or 0.0 when not available.
Is libc: tells whether the statement is provided by the standard C library. The possible values are:
- libc:no
- libc:yes
The -info-csv-statements filename
options export the same information into the given filename.
When the -info-files
options is used, some information is printed
for each files of the current project.
The following list explains which information can be found in each column:
File: filename name.
#functions: number of function defined in this file. The values are integers
Toplevel: tells whether the file is a toplevel one or if it has been included in another one. The possible values are:
- yes: if the file a is toplevel one
- no: if the file is not a toplevel one, i.e. it has been included in another one
- maybe: if a preprocessed file has been provided to the analyzer, and the file has the same basename, it can probably be a toplevel file.
Is libc: tells whether the file is from the tool libc or not. The possible values are:
- no
- yes
The -info-csv-files filename
options export the same information into the given filename.
When the -info-types
options is used, some information is printed
for each type of the current project.
Notice that the unused types are not listed since they are automatically removed from the internal representation.
The following list explains which information can be found in each column:
File: gives the file of the type definition/declaration.
Line: gives the line of the type definition/declaration.
Kind: gives the type kind. The possible values are:
- typedef
- enum
- struct
- union
Source name: gives the name found in the source file of the type for typedef, or of the tag otherwise. Print <empty> when the tag not specified.
Internal name: gives the internal name if different from the source name. The value is
<same>
otherwise.Defined: tells whether the enum/struct/union is defined, or only declared. For
typedef
, tells whether the underlying type is abstract (top only). The possible values are:
- defined
- declared
Sub-type: meaningful for ‘typedef’ only: show the underlying type. The value is
N/A
otherwise.Size (#bytes): gives the size of the type, in bytes, when available. The values are integers, where -1 means not available.
Is libc: tells whether the type is provided by the standard C library. The possible values are:
- libc:no
- libc:yes
The -info-csv-types filename
options export the same information into the given filename.
When the -info-variables
options is used, some information is printed
for each variable of the current project.
Notice that function variables are not listed here.
The following list explains which information can be found in each column:
Name(s): gives the name of the variable. Also give the internal name if the variable has been renamed to avoid ambiguities.
File: gives the file of the variable definition/declaration. The value is
no valid location
otherwise.Line: gives the line of the variable definition/declaration.
Type: gives the type kind of the variable (sub-type are not developed).
Function: gives the function of the variable if any. The value is
NA
otherwise.Kind: gives the kind of the variable. The possible values are:
- global variable
- local variable
- formal variable: for function formal parameters
Warning: static local variables are marked as ‘global’ here.
Storage: tells if the variable is defined or just declared, and also if it is a static variable. The possible values are:
- static
- declared: only global variables might be only declared
- defined
Initialized: tells whether the variable has an initializer or not. The possible values are:
- no
- yes
Notice that this is only meaningful for global variables since local initialization are transformed into statements.
Volatile: tells whether the variable is volatile or not. The possible values are:
- no
- yes
Notice that the qualification is detected anywhere in the type.
Const: tells whether the variable is const or not. The possible values are:
- no
- yes
Temporary: tells whether the variable is a temporary variables generated by the tool or a variables coming directly from user input. The possible values are:
- no
- yes
Is libc: tells whether the variable is provided by the standard C library. The possible values are:
- libc:no
- libc:yes
The -info-csv-variables filename
options export the same information into the given filename.
When the -info-properties
options is used, some information is printed
for each property of the current project.
The following list explains which information can be found in each column:
File: tells the location file of:
- the statement of the property for a local annotation,
- the statement of the call for an instance of a precondition,
- the function for a function contract property,
- the property itself otherwise.
Line: gives the line in the file
Function: gives the name of the function in which the property is if any. The value is
NA
otherwise.StmtId: for local properties, gives the number of the statement in the function. The values are integers, where -1 means not available.
Kind: gives the kind of property. The possible values are:
- differing blocks
- division by zero
- float to int
- index out of bound
- invalid shift
- is NaN or infinite
- logic memory access
- memory access
- not separated
- overflow
- overlap
- pointer comparison
- uninitialized
- valid string
- valid wstring
- comparable char blocks
- comparable wchar blocks
- dangling contents
- division overflow
- function pointer
- index in address
- pointer arithmetic
- unclassified
- bad libc call
- UB assert
- UB precondition
- assert
- loop invariant
- invariant
- requires
- assumes
- ensures
- exits
- breaks
- continues
- returns
- terminates
- allocates
- assigns
- behavior: not really a property but rather a collection of properties (see ACSL manual)
- complete behaviors
- decrease
- disjoint behaviors
- from
- axiom
- axiomatic: not really a property but rather a collection of properties (see ACSL manual)
- lemma
- instance of requires: for instances of requires properties at the call statements
- reachability: for properties about the reachability of functions and statements
- other: for other properties
Name: gives the name of the property if any. The value is
<empty string>
otherwise. When adding properties, it is very recommended to give them names to easily identify them.Status: gives the verification status of the property. The possible values are:
- surely valid: the status is
True
and the dependencies have the consolidated statussurely_valid
orconsidered_valid
.- considered valid: there is no possible way to prove the property (e.g., the post-condition of an external function). We assume this property will be validated by external means.
- valid under hypothesis: the local status is
True
but at least one of the dependencies has a consolidated statusunknown
. This is typical of proofs in progress.- valid but dead: the status is
True
, but the property is in a dead or incoherent branch.- unknown: the status is Maybe.
- unknown but dead: the local status is
Maybe
, but the property is in a dead or incoherent branch.
- surely invalid: the status is
False
,- and the dependencies the consolidated have status
surely_valid
.
- invalid under hypothesis: when the local status is
False
,- but at least one of the dependencies has status
unknown
. This is a sign of a dead code property, or of an erroneous annotation.
- invalid but dead: the local status is
False
,- but the property is in a dead or incoherent branch.
- inconsistent: there exists two conflicting consolidated statuses for the same property, for instance with values
surely_valid
andsurely_invalid
. This case may also arise when an invalid cyclic proof is detected. This is symptomatic of an incoherent axiomatization.- never tried: no status is available for the property.
See the user manual for more information about the status.
Alarm: tells whether the property is a generated alarm or not. The possible values are:
- yes
- no
Alarm rank: gives the rank of the emitted alarm if the property is a Value analysis alarm. The values are integers, where -1 means not available. It is advisable to study the alarms in the order in which they have been emitted since an alarm may alter the states after it.
MoreInfo: gives some more information about the property. Deprecated: rather use other tables to get more information. The possible values are:
- global
- in unreachable function: similar to _but_dead status suffix.
- inside -val-use-spec function: annotation are not reached in use-spec functions.
- ensures of -val-use-spec function: post-condition of use-spec functions cannot be checked, they are used as hypotheses.
- at unreachable statement: local annotation attached to an unreachable statement.
- instance of builtin requires: instance of a built-in pre-condition can be safely ignored since the built-in raises alarms in cas of problems.
- <empty string>: No additional information
Is libc: tells whether the property is provided by the standard C library. The possible values are:
- libc:no
- libc:yes
The -info-csv-properties filename
options export the same information into the given filename.
Let’s examine project_properties.csv
:
On a project with value analysis results, it could be interesting to find the unreachable properties that are not part of the library with:
$ grep never_tried project_properties.csv | grep -v TIS_KERNEL_SHARE
To find invalid properties:
$ grep invalid project_properties.csv
To find all the alarms generated by Value, sorted by their kind:
$ grep -v 'alarm:(no|NA)' project_properties.csv | sort -k5
To find the unverified user properties, sorted by function name:
$ cat project_properties.csv | grep 'alarm:(no|NA)' | grep -v ', valid,' \
| grep -v 'TIS_KERNEL_SHARE.*never_tried' | sort -k2
The -info-results
option
may be used to print some information about the result of a value analysis.
The default is to print all the sections, but a selection can be done by using
the -info-msg-key results:xxx
option. To know all the possible values for
xxx
, run:
$ tis-analyzer -info-msg-key help
There is no option at the moment to print all the sections except some of them, but because each line is prefixed with its category, it is quite easy to filter out some of them. For instance:
$ tis-analyzer ... -info-results \
| grep "^\[info:results" | grep -v "info:results:coverage"
Some -info-results-skip-xxx
options are provided as well
to specify elements that may be skipped in the report.
For instance -info-results-skip-abort e1,..en
specifies the functions
that are known to exit (they do not terminate) and that do not need to be
reported as elements to check. The specified expressions e
may be simple
function names, but also regular expressions.
The other -info-results-skip-xxx
options work similarly.
All of them may take a list or may be used several times on the command line.
See the All Options section for more information.
The -info-json-results
option may provide an output file name
to which the results are exported using the JSON format presented below.
The result sections presented below,
that are marked with a VERIF_ERROR
tag in the results,
are the most important one.
All the problems listed in these sections must be solved before going further.
This section reports the existence of some reachable functions with neither code nor specification.
This MUST be fixed either by providing source files with the body of these functions, or by adding them some specifications (with at least ‘assigns’ properties). When functions are listed in this section, the obtained results are probably meaningless.
The degeneration are points where the value analysis stopped. Results are not correct for lines of code that can be reached from the degeneration point, so this MUST be fixed.
This section reports the other non-terminating statements.
Some abort
like functions may appear here: this is normal.
The application functions which are known to never terminate may be excluded
using the -info-results-skip-abort
option.
Other cases must be examined carefully.
This section reports some messages from the tool.
Some of them are purely informative, but all of them have to be carefully reviewed manually.
This section reports the generated alarms.
The alarms should be reviewed carefully, and removed if possible. Starting from the first one is a good idea, since the next ones may be due to related reasons.
The following sections do not report errors per se, but pieces of information that should be checked to see if everything seems normal about them.
This section lists the hypotheses to verify.
Beside the alarms, these properties must be proved or justified.
Some of them may be proved with WP, or manually justified.
If named carefully (as explained in Naming the Annotations)
- for instance with wp
and rv
- you can exclude
them from this list using the -info-results-skip-props _wp,_rv
.
This section lists the functions with no code, but specifications.
These should be library functions.
Some of them may be ignored by using the -info-results-skip-lib
option.
The specified expression may also match the filename. For instance,
with -info-results-skip-lib external_headers
all the function from file in the external_headers
directory are ignored.
This section lists the functions with code, but which are nevertheless analyzed using their specification only.
These functions have a body, but the value analysis has been asked to
ignore them and to use their specifications instead (with the -val-use-spec
option). They are listed here as a remainder that the analysis has been done
under these hypotheses, and that at the end,
these specifications MUST be checked by analyzing these functions alone.
This section lists the functions from tis-analyzer
libc and built-ins
that have been used. These functions that come from the tool
are only listed here to show which ones are used.
This section lists the source files that contain no reachable functions.
In most cases these files can be safely removed from the analysis, unless they hold the definition (and the initialisation) of a used global variable. It is harmless to keep them though, and it can even be useful for measuring the coverage over the whole application.
The following sections provide more information that can be interesting to look at.
This section gives an overview of the analysis coverage.
This information may be interesting to analyze at the end of the study, but can be skipped during the elaboration of the analysis when these numbers can change a lot, which can be annoying.
To examine more precisely the coverage (by function),
rather use the tis-aggregate coverage
command
(see Tis-aggregate coverage).
The -info-precision
option is used to specify the level of information
printed about the loss of precision during the analysis.
This option takes an integer which may be:
For a positive level, it sets -info-msg-key results:precision
,
and if -info-msg-key results:precision
is set and the precision level is not
specified, it is set to 1.
Precision may be lost when:
slevel
is reached (see -slevel option documentation),This section shows information about the exit status of the analyzed application. It provides:
exit
calls reached during the analysis,exit
calls (if any).You could also be interested by the Termination
information on statements
to know if some more non-terminating statements can be reached.
The file generated when using the -info-json-results
option
holds the same information that are presented above
and respect the following format.
Warning
This is an experimental feature, and the format might change a little in the next release.
computation
(progress of preparing and computing the value analysis)
type
: string = error
(must be solved before going further)status
: string, one of {OK
, NOT OK
} (NOT OK
if a problem blocking the analysis or making its results inaccessible has been encountered, OK
otherwise)NOT OK
then the error
field must be presenterror
: optional string, one of:no input file
config error
pre-action failed
(a required action before the parsing has failed)parsing error
missing file
link error
no value analysis computation
(the value analysis was not demanded, thus it was not computed at all)no entry point
incorrect number of arguments
invalid user input
unexpected error
message
: optional string (more information about the encountered problem, if available)error
is config error
or missing file
, message
always holds the name of the concerned fileerror
is parsing error
, message
holds information about the parsing error that occurred with its locationerror
is no entry point
, message
always holds the name of the expected entry pointerror
is no input file
, incorrect number of arguments
, no value analysis computation
, or no value analysis results
, there is no message
fielderror
is link error
or `unexpected error
, message
is optional; if present, it holds some additional information about the errorundefined_functions
(reachable functions with neither code nor specification)
type
: string = error
(must be solved before going further)status
: string, one of {OK
, NOT OK
} (NOT OK
if any reachable function with neither code no specifications has been encountered, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of objects (locations of the declarations of each reachable undefined function):file
: string (path to the file)line
: integer (line number in the file)function
: string (name of the function)degeneration
(point in the program where the value analysis stopped)
type
: string = error
(must be solved before going further)status
: string, one of {OK
, NOT OK
} (NOT OK
if a degeneration has occurred, OK
otherwise)NOT OK
then the reason
field must be presentreason
: optional string (degeneration reason), one of:user interruption
timeout
out of memory
stop at nth alarm
stop at nth garbled
stop at first tis check included
entry point with unexpected args
pointer library function
throwing library function
recursive call
recursive clone depth has reached the limit
writing at unknown address
assign with extern function
fail to call builtin
invalid call to builtin
cannot resolve format string
format string not supported
watchpoint
interpreter not a singleton
interpreter invalid property
interpreter format string percent p
interpreter assign in extern function
interpreter encounters a limitation
reason
reason
is invalid call to builtin
or entry point with unexpected args
:
or interpreter invalid property
:message
: string (more information about the problem)reason
is format string not supported
:may_be_invalid
: booleanreason
is interpreter not a singleton
:expression
: string (the expression that cannot be evaluated to a single value)imprecision
: array of one object among:variable
: object:name
: string (name of the variable involved in the expression)type
: string (type of the variable)volatile
: booleanundefined
: booleanvolatile_lvalue
: string (volatile sub-expression)imprecise_lvalue
: string (imprecise sub-expression)imprecise_pointed_value
: object:pointer
: string (pointer expression)pointer_type
: string (type of the pointer)pointer_value
: string (pointed expression)imprecise_operation
: object:expression
: string (imprecise expression)type
: string (type of the expression)value
: string (value of the expression)sub-expressions
: array of objects:expression
: string (sub-expression)type
: string (type of the sub-expression)value
: string (value of the sub-expression)program_point
optional object (point in the program where the degeneration occurred)file
: string (path to the file)line
: integer (line number in the file)function
: string (name of the function)function_start_line
: integer (the line number where function
begins in file
)function_end_line
: integer (the line number where function
ends in file
)statement_kind
: string, one of:test
function call
instruction
inline assembly
loop
return
jump
switch
block
unspecified sequence
other statement
statement_kind
(similar to the More
column of info-statements
):statement_kind
is loop
:loop_number
: integer (number of the loop in the function)statement_kind
is function call
:direct_call
: boolean (is the call direct or indirect)direct_call
is true
:called_function
: string (name of the function which is called directly)direct_call
is false
:called_functions
: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not availablenon_terminating
(other non-terminating statements)
type
: string = to verify
(not an error, but needs to be checked)
status
: string, one of {OK
, NOT OK
} (NOT OK
if any non-terminating statement has been encountered, OK
otherwise)
NOT OK
then the list
field must be present and not emptyskipped
: optional array of strings (names of functions specified using the -info-results-skip-abort
option, known to not terminate and thus ignored here; these functions have been excluded from the list
array below)
list
: optional array of objects (details of each non-terminating statement):
never_terminating
: boolean (true
if the statement never terminates, false
if it may terminate for some callstacks)
program_point
object (point in the program where the non-terminating statement is located)
file
: string (path to the file)
line
: integer (line number in the file)
function
: string (name of the function)
function_start_line
: integer (the line number where function
begins in file
)
function_end_line
: integer (the line number where function
ends in file
)
statement_kind
: string, one of:
test
function call
instruction
inline assembly
loop
return
jump
switch
block
unspecified sequence
other statement
additional fields depending on statement_kind
(similar to the More
column of info-statements
):
statement_kind
is loop
:loop_number
: integer (number of the loop in the function)statement_kind
is function call
:direct_call
: boolean (is the call direct or indirect)direct_call
is true
:called_function
: string (name of the function which is called directly)direct_call
is false
:called_functions
: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not availablecall_stacks
: optional array of array of objects (list of call stacks; each call stack is described by list of objects representing the call sites). A call site has the following fields:
function
: string (function of the call site)file
: string (file of the call site)line
: integer (line in file
of the call site)Note: an empty array means the program point is located in the entry point of the program.
messages
(selected log file messages)
type
: string = to verify
(not an error, but needs to be checked)status
: string, one of {OK
, NOT OK
} (NOT OK
if some important messages have been emitted such as warnings or errors, OK
otherwise)NOT OK
then the list
field must be present and not emptyskipped
: optional array of strings (messages specified using the -info-results-skip-log
option, known to be normal and thus ignored here; messages matching these regular expressions and messages concerning plugins with names matching these regular expressions have been excluded from the list
array below)list
: optional array of objects (details of each message):message
: string (text of the message)plugin
: string (name of the plugin that emitted the message)kind
: string, one of:warning
user error
failure
location
: optional object (location of the problem in the source file if any)file
: string (path to the file)line
: integer (line number in the file)alarms
(generated alarms)
type
: string = error
(must be solved before going further)
status
: string, one of {OK
, NOT OK
} (NOT OK
if any alarm has been generated, OK
otherwise)
NOT OK
then the list
field must be present and not emptylist
: optional array of objects (details of each generated alarm):
kind
: string, one of:
differing blocks
division by zero
float to int
index out of bound
invalid shift
is NaN or infinite
logic memory access
memory access
not separated
overflow
overlap
pointer comparison
uninitialized
valid string
valid wstring
comparable char blocks
comparable wchar blocks
dangling contents
division overflow
function pointer
index in address
pointer arithmetic
unclassified
bad libc call
UB assert
UB precondition
additional fields depending on kind
kind
is unclassified
:description
: string (description of the alarm)kind
is not unclassified
:predicate
: string (alarm ACSL predicate)kind
is unclassified``or ``function pointer
:details
: optional string (details about the alarm when available)rank
: optional integer (emission rank of the alarm when it is emitted by the value analysis)
status
: string (status of the property), one of:
surely valid
considered valid
valid under hypothesis
valid but dead
unknown
unknown but dead
surely invalid
invalid under hypothesis
invalid but dead
inconsistent
never tried
program_point
object (point in the program where the alarm has been generated)
file
: string (path to the file)
line
: integer (line number in the file)
function
: string (name of the function)
function_start_line
: integer (the line number where function
begins in file
)
function_end_line
: integer (the line number where function
ends in file
)
statement_kind
: string, one of:
test
function call
instruction
inline assembly
loop
return
jump
switch
block
unspecified sequence
other statement
additional fields depending on statement_kind
(similar to the More
column of info-statements
):
statement_kind
is loop
:loop_number
: integer (number of the loop in the function)statement_kind
is function call
:direct_call
: boolean (is the call direct or indirect)direct_call
is true
:called_function
: string (name of the function which is called directly)direct_call
is false
:called_functions
: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not availablecall_stacks
: optional array of array of objects (list of call stacks; each call stack is described by list of objects representing the call sites). A call site has the following fields:
function
: string (function of the call site)file
: string (file of the call site)line
: integer (line in file
of the call site)Note: an empty array means the program point is located in the entry point of the program.
values
: optional array of objects:
lvalue
: lvalue that appears in the predicatevalues
: possible values of the lvaluehypotheses
(hypotheses to verify)
type
: string = to verify
(not an error, but needs to be checked)status
: string, one of {OK
, NOT OK
} (NOT OK
if there are some hypotheses to verify, OK
otherwise)NOT OK
then the list
field must be present and not emptyskipped
: optional array of strings (names of properties specified using the -info-results-skip-props
option, known to be verified elsewhere and thus ignored here; properties with matching substrings have been excluded from the list
array below)list
: optional array of objects (details of each property which has been used as a hypothesis):kind
: string, one of:assert
loop invariant
invariant
ensures
exits
breaks
continues
returns
terminates
allocates
assigns
complete behaviors
decrease
disjoint behaviors
from
axiom
axiomatic
lemma
instance of requires
other
name
: optional string (name of the property, if any)status
: string (status of the property), one of:surely valid
considered valid
valid under hypothesis
valid but dead
unknown
unknown but dead
surely invalid
invalid under hypothesis
invalid but dead
inconsistent
never tried
more_information
: optional string, one of:global
in unreachable function
inside -val-use-spec function
ensures of -val-use-spec function
at unreachable statement
instance of builtin requires
file
: string (path to the file)line
: integer (line number in the file)function
: optional string (name of the function)no_body_but_spec
(functions with no code, but with specifications)
type
: string = to verify
(not an error, but needs to be checked)status
: string, one of {OK
, NOT OK
} (NOT OK
if any function with no body but with specifictions has been encountered, OK
otherwise)NOT OK
then the list
field must be present and not emptyskipped
: optional array of strings (names of functions specified using the -info-results-skip-lib
option, known to be library functions and thus ignored here; these functions have been excluded from the list
array below)list
: optional array of strings (names of functions)body_but_use_spec
(functions with bodies, but which are analyzed using their specifications only)
type
: string = informative
(information that may be interesting to look at)status
: string, one of {OK
, NOT OK
} (NOT OK
if any function which has a body has been analyzed using its specifications only, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of strings (names of functions)external_variables
(used external variables)
type
: string = informative
(information that may be interesting to look at)status
: string, one of {OK
, NOT OK
} (NOT OK
if the value of any undefined (i.e. declared external global) variable has been used, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of objects (used external variables with their declaration location):name
: string (name of the variable)type
: string (type of the variable)file
: string (path to the file)line
: integer (line number in the file)assembly
(reachable assembly statements)
type
: string = informative
(information that may be interesting to look at)status
: string, one of {OK
, NOT OK
} (NOT OK
if any reachable assembly statements has been encountered, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of objects (locations of the assembly statements):file
: string (path to the file)line
: integer (line number in the file)function
: string (name of the function)builtins
(used functions from the tis-analyzer libc and built-ins)
type
: string = informative
(information that may be interesting to look at)status
: string, one of {OK
, NOT OK
} (NOT OK
if any function from the tis-analyzer libc or a built-in has been used, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of strings (names of functions or built-ins)useless_files
(source files that contain no reachable functions)
type
: string = to verify
(not an error, but needs to be checked)status
: string, one of {OK
, NOT OK
} (NOT OK
if any source file containing no reachable functions has been encountered, OK
otherwise)NOT OK
then the list
field must be present and not emptylist
: optional array of strings (names of useless files)coverage
(overview of the analysis coverage)
type
: string = informative
(information that may be interesting to look at)function_coverage
: string, format X%
(percentage of reachable functions)total_statements_coverage
: string, format X%
(percentage of reachable statements)semantic_statements_coverage
: string, format X%
(percentage of reachable statements in the reachable functions)precision
(information about the loss of precision during the analysis and its reasons, and also about the value of some level counters)
type
: string = informative
(information that may be interesting to look at)status
: string, one of {OK
, NOT OK
} (NOT OK
if any precision has been lost, OK
otherwise)NOT OK
then at least one of the reaching_slevel_stmts
, reaching_malloc_plevel_stmts
, widened_loops
, or merged_loops
fields must be presentmax_slevel
: non-negative integer (maximum slevel
that has been reached)max_malloc_plevel
: optional positive integer (maximum malloc-plevel
that has been reached if any)max_clone_depth
: optional positive integer (maximum depth of the cloned recursive functions if any)reaching_slevel_stmts
: optional positive integer (number of statements where the slevel
has been reached)reaching_malloc_plevel_stmts
: optional positive integer (number of statements where the malloc-plevel
has been reached)widened_loops
: optional positive integer (number of loops where a widening has been performed)merged_loops
: optional positive integer (number of loops where states were merged)time
(optional information about the time taken by the analyzer. Available only if the -time option is set.)
type
: string = informative
(information that may be interesting to look at)parsing_time
: string representation of a positive float (time taken by the parsing in seconds)cxx_parsing_time
: optional string representation of a positive float (time taken by the C++ parsing in seconds)java_parsing_time
: optional string representation of a positive float (time taken by the Java parsing in seconds)value_analysis_time
: string representation of a positive float (time taken by the parsing in seconds)total_time
: string representation of a positive float (total time in seconds)target
(environment of the analysis: target machine, analysis entry point, program arguments)
type
: string = informative
(information that may be interesting to look at)entry_point
: string (name of the entry point function)arguments
: optional string (arguments given to the entry point with the -val-args
option, if any)architecture
: string (target architecture, corresponding to machdep
)endianness
: string, either little endian
or big endian
entry_point_return
: (information about the return value of the entry point function)
type
: string = informative
(information that may be interesting to look at)reachable
: boolean (reachability of the entry point function’s return statement)value
: optional string (the value returned by the entry point function, if any; present only if the return statement is reachable the entry point function does not return void, and the -no-results option has not been used for that function)exit_function
(information about calls to the _exit
function)
type
: string = informative
(information that may be interesting to look at)reachable
: boolean (reachability of the _exit
function’s return statement)exit_return_value
: optional string (possible values returned through the _exit
function, only available when any call to _exit
is reachable)value_results
(indicates whether the Value Analysis results were available to compute all other information. true
if the -no-results option was not given, false
otherwise.)
memory_usage
(the peak virtual memory usage of the analyzer)
words
: integer (size in words)bytes
: integer (size in bytes)kbytes
: integer (size in kilobytes)Mbytes
: integer (size in megabytes)Gbytes
: integer (size in gigabytes)The statement information already holds information about the dead blocks
(see About Statements), but some additional information
can be computed by setting some -dead-xxx
options
(see the DEAD CODE
section in All Options).
Any of these options trigger the compilation,
and the generated and printed information depend on the combination
of their values as explained below.
The main information computed about the dead code is to determine for each function whether:
it is reachable (and may have dead blocks),
or it is unreachable, and:
- it cannot be called at all because there is no syntactic calls to it, and its address is not taken,
- it may be called (and more details are given about how).
The default is to examine all function of the analyzed application
(excluding the libc functions), but the -dead-functions
option
may be used to compute information for a function subset.
More details on the dead blocks of the reachable functions
are given depending of the setting of the -dead-xxx
options.
Note
What is called a dead block here is not necessarily C block.
It is a set of contiguous statements
that are not reachable by the Value Analysis.
See the documentation of the Dead Id
,
Dead code entry
and Dead code exit
columns
in About Statements.
Some of these options are presented below.
Most of the time, the entry points of a dead block are tests that are always
evaluated to the same value (either always true or always false).
With the -dead-bk-with-tests
option,
the expression and evaluation of these tests are printed.
Moreover, with the -dead-bk-with-subexp
option,
the possible values of each sub-expression are also printed,
and if the -dead-bk-with-defs
option is set,
the origin of the definitions of the involved variables are also computed
(this last option requires the -whole-program-graph
option to be set).
All these elements may help to understand why this dead block is unreachable.
__tis_dead__
attribute¶The __tis_dead__
block attribute may be used to mark some blocks.
The -dead-bk-with-tis-dead
and -dead-bk-skip-tis-dead
may then be used to report about those blocks or to ignore them
if the attribute is used to mark expected dead blocks.
For instance, in this example, the attribute may mean that we do not care if this block is dead, since it normally should be:
if (p == NULL) { __blockattribute__(__tis_dead__);
abort();
}
With the -dead-compute
option, the results are printed to stdout
.
With the -dead-json
option, the same results are exporter in JSON format
to the specified file.
With the -dead-bk-summary
option, only the list of the dead blocks
(in reachable functions) are printed, with some details about each of them.
When some functions cannot be called at all,
a justification file can be generated by setting the -dead-justification
option. This file can then be provided to tis-aggregate
to justify that those functions cannot be reached.
The tis-info-show-xxx
options provide ways to make queries in order to print specified things
The following list gives the provided options:
-info-show-val: takes a list of comma separated queries, and print the value computed for each of them. A query is composed of two parts separated by a white space:
the first one specifies the program point. There are three ways to specify it:
- by its internal number
sid
,- or by
f@n
wheref
is a function name, andn
the number of the statement inside it,- or by
filename:line_number
.the second part specifies the term to evaluate. The term is specified using the ACSL syntax, but is usually just a variable name.
Example:
-info-show-val "f@1 x","g@3 y"
print the value of x before the first statement of f, and the value of y before the third statement of g.Notice that values are available only if the value analysis computed them.
-info-show-state: print the value of the computed state at the specified program point. The syntax to specify the program point is the same than for -info-show-val.
-info-show-statements: print the selected statement for the queries that involve program points.
-info-show-with-callstack: show the detailed values for each call stack.
This is the result of the -info-help
option
that gives the list of all the available options:
Plug-in name: info
Plug-in shortname: info
Description: extract many information from the current project.
Most options of the form '-info-option-name' and without any parameter
have an opposite with the name '-info-no-option-name'.
Most options of the form '-option-name' and without any parameter
have an opposite with the name '-no-option-name'.
Options taking a string as argument should preferably be written
-option-name="argument".
***** LIST OF AVAILABLE OPTIONS:
-info-exit-status print information about the application exit status, and
exit points. Needs value analysis results. (opposite
option is -info-no-exit-status)
-info-files print information about files from the current project.
(opposite option is -info-no-files)
-info-functions print information about functions from the current
project. (opposite option is -info-no-functions)
-info-precision <n> give information about loss of precision: 0 for none, 1
or more for more (defaults to 0)
-info-properties print information about properties from the current
project. (opposite option is -info-no-properties)
-info-reachable deprecated. Print messages about semantically reached
statements.This is useful to collect information to
compute the coverage on several analyses of the same
functions. (opposite option is -info-no-reachable)
-info-shared-variables print information about the shared variables after a
Mthread analysis (opposite option is
-info-no-shared-variables)
-info-statements print information about statements (opposite option is
-info-no-statements)
-info-types print information about types (opposite option is
-info-no-types)
-info-variables print information about global variables (opposite option
is -info-no-variables)
*** CSV GENERATION
-info-csv-all <basename> export all the computed information in csv files
with filenames based on the specified basename
-info-csv-files <filename> export information about files in the specified
csv file
-info-csv-functions <filename> export information about functions in the
specified csv file
-info-csv-properties <filename> export information about properties in the
specified csv file
-info-csv-shared-variables <filename> export information about shared
variables in the specified csv file
-info-csv-statements <filename> export information about statements in the
specified csv file
-info-csv-types <filename> export information about types in the specified
csv file
-info-csv-variables <filename> export information about global variables in
the specified csv file
*** DEAD CODE
-dead-bk-skip-tis-dead If all the statements in a dead block are in blocks
marked with `__blockattribute__(__tis_dead__)`, do not
report this dead block in the results. (opposite option
is -no-dead-bk-skip-tis-dead)
-dead-bk-summary Print a summary about dead code blocks in reachable
functions in CSV format. (opposite option is
-no-dead-bk-summary)
-dead-bk-with-defs Print the origin of the sub-expressions of the tests
(only if -dead-bk-with-tests and also needs the
-whole-program-graph option). (opposite option is
-no-dead-bk-with-defs)
-dead-bk-with-subexp Print the values of the sub-expressions of the tests
(only if -dead-bk-with-tests). (opposite option is
-no-dead-bk-with-subexp)
-dead-bk-with-tests Print the test expressions of the dead blocks. (opposite
option is -no-dead-bk-with-tests)
-dead-bk-with-tis-dead Print information about the blocks marked with
`__blockattribute__(__tis_dead__)`. (opposite option is
-no-dead-bk-with-tis-dead)
-dead-compute Print information about dead code. See other '-dead-xxx'
options with -info-help. to tune the details of the
computation. (opposite option is -no-dead-compute)
-dead-functions <f1,...,fn> Name of the functions to examine. Examine all of
them by default.
-dead-json <filename> Compute the same results than the -dead-compute
option, and print them to the given file using the JSON
format.
-dead-justification <filename> generate a justification file to be used with
tis-aggregate. Not available when using the
-dead-functions option.
*** GETTING INFORMATION
-info-help help of plug-in info
-info-h alias for option -info-help
-info-help-files print help about files information. (opposite option is
-info-no-help-files)
-info-help-functions print help about functions information. (opposite
option is -info-no-help-functions)
-info-help-json-results print help about the JSON format used in the file
generated when using the '-info-json-results' option.
(opposite option is -info-no-help-json-results)
-info-help-properties print help about properties information. (opposite
option is -info-no-help-properties)
-info-help-reachable print help about getting information about reachable
statements. (opposite option is -info-no-help-reachable)
-info-help-shared-variables print help about getting information about
shared variables. (opposite option is
-info-no-help-shared-variables)
-info-help-show print more help about the 'show' queries. (opposite
option is -info-no-help-show)
-info-help-statements print help about getting information about statements.
(opposite option is -info-no-help-statements)
-info-help-types print help about getting information about types.
(opposite option is -info-no-help-types)
-info-help-variables print help about getting information about global
variables. (opposite option is -info-no-help-variables)
*** OUTPUT MESSAGES
-info-debug <n> level of debug for plug-in info (defaults to 0)
-info-msg-key <k1[,...,kn]> enables message display for categories
<k1>,...,<kn>. Use -info-msg-key help to get a list of
available categories, and * to enable all categories
-info-verbose <n> level of verbosity for plug-in info (defaults to 1)
*** RESULTS
-info-json-results <filename> export the -info-results results to the given
file using the JSON format.
-info-results print an overview of the most important results of an
analysis. (opposite option is -info-no-results)
-info-results-skip-abort <f1,..,fn> specify a list of functions that are
known to not terminate (such as 'abort' functions). They
will be ignored in the `-info-msg-key
results:non-terminating` output.
-info-results-skip-lib <f1,..,fn> specify a list of functions that are known
to be library functions (i.e. with no body but a
specification). They will be ignored in the
`-info-msg-key results:no-body-but-spec` output.
-info-results-skip-log <re_1,..,re_n> specify a list of regular expressions
that match messages to be considered as normal. The
regular expressions are also compared to the plugin name
to be able to ignore all the messages from a given
plugin. They will be ignored in the `-info-msg-key
results:messages-terminating` output.
-info-results-skip-props <p1,..,pn> specify a list of properties that can be
ignored in the `-info-msg-key results:hypotheses` output
because they are verified elsewhere. For instance, use
'-info-results-skip-props _wp,_rv' if the properties
verified by WP have the '_wp' suffix, and the properties
manually reviewed have the '_rv' suffix.
-tis-report generate all files needed by tis-report in directory
_results. This may be changed with the option
-tis-report-directory (opposite option is -no-tis-report)
-tis-report-directory <basedir> generate tis-report files to directory
basedir (defaults to _results)
*** SHOW QUERIES
-info-show-state <program_point> print the value of the computed state at
the specified program point. The syntax to specify the
program point is the same than for -info-show-val.
-info-show-statements print the selected statement for the queries that
involve program points. (opposite option is
-info-no-show-statements)
-info-show-val <"program_point term"> print the value of the given term at
the specified program point. More details about the
syntax with the -info-help-show option.
-info-show-with-callstack show the detailed values for each call stack.
(opposite option is -info-no-show-with-callstack)
*** TUNING
-info-compute do not output anything, but compute all the internal
tables. This is useful for a saved project to save time
later on when loading it. (opposite option is
-info-no-compute)
-info-stmt-no-reduction in the 'more' column of the 'test' statements
information, add 'no reduction' if the states are the
same in both branches. Not enabled by default because its
computation is expensive. (opposite option is
-info-no-stmt-no-reduction)
-info-with-stat show the columns that hold the value analysis timing
statistics in the results (default). It is useful to
unset this option when the goal is to compare the results
between several analyses because the timing may vary even
when the other results remain the same. (set by default,
opposite option is -info-without-stat)
-info-without-stat opposite of option "-info-with-stat"
Caution
The tis-mk-main
tool is not available in this version.
The tis-mk-main
tool
generates a C file containing a new entry point for a program expecting
command line arguments. More precisely, the generated file contains a
function tis_main
calling the main function of the original program with
the specified arguments.
The command line to call the tool is composed of two parts separated
by a double dash --
and so it looks like:
$ tis-mk-main <options> -- <arguments>
The options to change the default behavior are:
-o <file>
tis-main-dummy.c
);-prog-name <name>
argv[0]
(default is UNKNOWN
);-main <function-name>
tis_main
);-f
-with-env
char** env
;-help
The arguments can be listed as they would be in the command line
to call the original program. Arguments beginning by a -
have to be preceded by a -
alone (see example below).
Two more options are provided
to specify some indeterminate arguments:
-tis-var <size>
-tis-cst <size>
For instance, following command generates a function my_main
in a file new.c
that calls the main
function of the application
to analyze with 3 arguments:
$ tis-mk-main -f -main my_main -o new.c -prog-name PROG
-- arg1 -tis-var 2 - -opt1 -tis-cst 2 - --opt2
The generated file is:
#include "tis_builtin.h"
char* argv[6];
char argv_0[] = "PROG";
char argv_1[] = "arg1";
char argv_2[3];
char argv_3[] = "-opt1";
char argv_4[3];
char argv_5[] = "--opt2";
extern int main(int argc, char* argv[]);
void my_main() {
tis_make_unknown(argv_2, 2);
argv_2[2] = 0;
if (tis_interval(0,1)) {;
argv_4[0] = tis_char_interval(-0x7F,-1);
} else {
argv_4[0] = tis_char_interval(1,0x7F);
}
argv_4[2] = 0;
if (tis_interval(0,1)) {;
argv_4[1] = tis_char_interval(-0x7F,-1);
} else {
argv_4[1] = tis_char_interval(1,0x7F);
}
argv_4[2] = 0;
argv[0] = argv_0;
argv[1] = argv_1;
argv[2] = argv_2;
argv[3] = argv_3;
argv[4] = argv_4;
argv[5] = argv_5;
main(6, argv);
}
Instead of using tis-mk-main
, the Context From Precondition plug-in (CFP
for short) can be used to generate an entry point from the preconditions of the
function to analyze. This is useful when one wants to be sure that the analysis
context is not smaller than the context specified by those preconditions (see
Pre-conditions for more information).
To generate a context for a function f
, the command line looks like that:
tis-analyzer <options and c files> -cfp -cfp-input f
For instance, the preconditions for f
could be:
/*@ requires 0 <= size <= 100;
requires \valid_read (in + (0..(size - 1)));
requires \initialized (in + (0..(size - 1)));
requires out == \null || \valid (out + (0..(size - 1)));
*/
int f (int * out, int const * in, int size);
In most cases, the files __fc_builtin.h
and stdlib.h
from the tool
standard library are needed in order to make the CFP plug-in work correctly, so
they should be included in the project.
Since CFP generates the code in a new project, the -then-last
option must be
used to switch to this project and be able to, for instance, run the value
analysis on it:
tis-analyzer <options and c files> -cfp -cfp-input f -then-last -val
More CFP options are available, see the plug-in’s help (using the
-cfp-help
option) for more information.
Caution
The tis-mkfs
tool is not available in this version.
The aim of the tis-mkfs
tool is to generate C source code that represents a
virtual file system. It is useful when analyzing an application that has some
interactions with the file system, such as reading and writing files
(including special character devices), pipes and sockets.
The generated code, added to the analysis, makes it possible
to analyze the behavior of an application accessing files,
or using other operations related to the file system.
Without tis-mkfs
, the default behavior of the analyzer is to consider
that the filesystem only contains /dev/stdin, /dev/stdout and /dev/stderr.
This manual explains how to use the tis-mkfs
tool.
It also gives the tool limitations and provides workaround solutions.
Some internal details are provided to help customizing
the virtual file system content and/or behavior if needed.
Starting from a file hello.txt
that holds the string Hello!
the following command generates the mkfs_f