Tip
Use your browser search engine (such as Ctrl+F) to search in the entire documentation.
Caution
The documentation describes all the features and tools available in the commercial version of TrustInSoft Analyzer, some of which are not available through on-line tools.
The sections of the documentation describing features and tools that are only available in the commercial version are explicitly marked.
TrustInSoft Analyzer is an award winning static C and C++ source code analyzer that ensures the safety and security of the source code by providing mathematical/formal guarantees over program properties.
TrustInSoft Analyzer takes advantage of state-of-the-art technology to provide different sets of tools to facilitate the analysis of C and C++ programs.
Structure of the Documentation
Get Started
Start with the Tutorials to learn how to use TrustInSoft Analyzer, browse the source code and the alarms in the GUI to eventually have the guarantee of the absence of Undefined Behavior.
TrustInSoft Analyzer: non-technical overview
This section provides step-by-step tutorials to help you get started with TrustInSoft Analyzer.
The goal of TrustInSoft Analyzer is to prevent runtime errors by analyzing all of the possible values that the variables can take at any given point in the program, in order to prove that none of the execution paths leads to a problem (such as an undefined behavior or a forbidden operation).
This verification is called the value analysis.
Note
Unlike testing or binary analysis, the value analysis provided by TrustInSoft Analyzer is exhaustive: the guarantees provided apply to all the concrete executions of the program. Even the tests with the best coverage will only test a few execution paths in a program, whereas binary analysis is strongly dependent on the compilation and execution environments. This is why static value analysis gives stronger guarantees than both testing and binary analysis.
The value analysis will try to prove all the properties needed for the code to be correct. If a property can not be automatically proved, the analyzer will emit an alarm, such as:
/*@ assert Value: division_by_zero: b ≢ 0; */
It means that, in order for the program to be correct, the value of b
needs to be non-zero at the point in the execution pointed by the analyzer.
At this point there are two possibilities:
There is an execution path for which b = 0
that will lead to an error
or an undefined behavior.
This means there is a bug in the program that needs to be corrected in order to ensure that the property will hold.
There is no execution path for which b = 0
, but the analyzer was not able
to prove the validity of the property.
This means that the analyzer over-approximates the possible values
of b
, and in order to make the alarm disappear, it will be
necessary to guide the analyzer to be more precise on the possible values
of b, and then run the analysis again.
Tip
An alarm is a property, expressed in a logic language, that needs to hold at a given point in the program in order for the program to be correct.
The following examples show how to use TrustInSoft Analyzer on test snippets to eventually guarantee the absence of undefined behavior for the input values of the test.
Tip
When used to verify tests, TrustInSoft Analyzer can be used with
the option --interpreter
so that no other special tuning is
required.
This allows the analyzer to remain precise, and thus, each alarm is a true bug that needs to be fixed.
The example array.c
, is located in the directory
/home/tis/1.45.1/C_Examples/value_tutorials/getting_started
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i <= 5; i++)
printf("array[%d] = %d\n", i, array[i]);
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Open the GUI after starting the analysis with the following command:
$ tis-analyzer-gui --interpreter /home/tis/1.45.1/C_Examples/value_tutorials/getting_started/array.c
TrustInSoft Analyzer emits the following alarm:
/*@ assert Value: index_bound: i < 5; */
Tip
Meaning of the property: for the program to be valid, the value of
i
must be strictly less than 5
when used to access
array[i]
.
To see both values of i
and array
, right click on i
and Track this term in the statement printf("array[%d] =
%d\n", i, array[i]);
highlighted in orange, then click on array
on that same statement. To see the values at each iteration of the
loop, click on Per Path and scroll down to the last raw.
The value of i
is successively equal to:
0
at the first iteration of the loop,1
at the second iteration of the loop,2
at the third iteration of the loop,3
at the fourth iteration of the loop,4
at the fifth iteration of the loop,5
at the sixth iteration of the loop,array
is an array local var. (int[5])
Accessing array[i]
when i
is equal to 5
is an access out
of the bounds.
In order to continue the analysis, the program must be fixed and the analysis run again.
Note
Indeed, not all statements are analyzed, see the statements highlighted in red.
The statements after the loop are highlighted in red because they are unreachable with respect to the input values of the program. This means that none of the computed value allows to continue after the loop with a well-defined execution of the program.
Understanding the Root Cause to Fix the Program
Tip
Now that we know that when i
is equal to 5
there is an
access out of the bounds of array
, we will ask the analyzer
where does this value come from.
Click on the button Inspect i to see the last write to the
left-value i
according to the current statement, and the value of
i
at this statement.
Tip
The button Inspect i is equivalent to the following actions:
i
and then Show Defsi
to see the valuesIn the Interactive Code, the two statements highlighted in
green show the last writes to i
, moreover, the Show
Defs panel shows that these two write locations are the only one. The
location i = 0
correspond to the declaration and initialization,
while the location i++
is where the value of i
is eventually
5
.
We fix the program by changing the conditional condition to exit the loop.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i < 5; i++)
printf("array[%d] = %d\n", i, array[i]);
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Kill the current analysis, fix the source code and then run again the analysis.
Tip
When analyzing real life project, and when the source code must be fixed, it is recommended to follow the process below:
However, for simple use case, such as this one, everything can be done from the GUI, following this process:
TrustInSoft Analyzer emits the same alarm, but at a different location:
/*@ assert Value: index_bound: i < 5; */
Tip
Meaning of the property: for the program to be valid, the value of
i
must be strictly less than 5
when used to access
array[i]
.
To see both values of i
and array
, right click on i
and Track this term, then click on array
.
The value of i
is equal to 5
, and array
is an array
local var. (int[5])
. Accessing array[i]
when i
is equal to
5
is an access out of the bounds.
Once again, we fix the program to print the last element of the array, kill the current analysis and then run again the analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // tis-analyzer-gui --interpreter array.c
#include <stdio.h>
int main(void)
{
int array[5] = {0, 1, 2, 3, 4};
int i;
for (i = 0; i < 5; i++)
printf("array[%d] = %d\n", i, array[i]);
i--;
printf("array[%d] = %d\n", i, array[i]);
return 0;
}
|
Tip
TrustInSoft Analyzer guarantees the absence of undefined behavior for the input values of this test.
In this example we will study the program example1.c
, located in the
/home/tis/1.45.1/C_Examples/TutorialExamples
directory. This program performs some operations
on unsigned integers.
1 2 3 4 5 6 7 8 9 10 | //Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
int sum, prod, quot, diff;
sum = a + b;
diff = a - b;
prod = a * b;
quot = a / b;
}
|
Start the value analysis (option -val
)
with the following command:
$ tis-analyzer-gui -val /home/tis/1.45.1/C_Examples/TutorialExamples/example1.c
and launch the GUI (as explained in the Getting Started section).
Your browser window should display a division by zero
alarm:
Selecting the alarm in the bottom panel will highlight the program point at which the alarm was raised, both in the Source Code Window (right panel) and in the Interactive Code Window, where you can also see the ACSL assertion generated by the analyzer:
/*@ assert Value: division_by_zero: b ≢ 0; */
The program takes two unsigned integer arguments a
and b
that are unknown at the
time of the analysis. The value analysis must ensure that the program is correct
for all possible values of a
and b
. The alarm was raised because if
b
is equal to zero, there will be a division by zero
, which leads to
undefined behavior. This means that anything can happen at this point, making
the program incorrect (and dangerous to use, as this will lead to a runtime error).
Let’s modify the program by adding an if (b != 0)
statement in order to perform the
division only if b
is non-zero.
1 2 3 4 5 6 7 8 9 10 11 | //Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
int sum, prod, quot, diff;
sum = a + b;
diff = a - b;
prod = a * b;
if (b != 0)
quot = a / b;
}
|
and launch the analysis on the modified version example1_ok.c
:
$ tis-analyzer-gui -val /home/tis/1.45.1/C_Examples/TutorialExamples/example1_ok.c
Congratulations! All the alarms have disappeared, so the program is guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer. This means that it will run safely, whatever the arguments provided.
You can now move to the next series of examples.
Compiling well defined, standard-compliant C or C++ source code should ensure that your program will behave as intended when executed, whatever the platform or compilation chain used.
TrustInSoft Analyzer’s static code analysis helps you produce high quality, standard-compliant code that is guaranteed to be free from a wide range of weaknesses and vulnerabilities.
The example we will examine shows how TrustInSoft Analyzer can detect dangling pointers (also known as CWE-562:Return of Stack Variable Address or CWE-416: Use After Free in the Common Weaknesses Enumeration).
This example originally appeared in a blog post by Julien Cretin and Pascal Cuoq on TrustInSoft website. We encourage you to read the whole post for more technical details and some examples of production code carrying this kind of bug.
Let’s have a look at the program example3.c
, located in the
/home/tis/1.45.1/C_Examples/TutorialExamples
directory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | /* TrustInSoft Analyzer Tutorial - Example 3 */
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
{
char a = 3;
p = &a;
pv = (uintptr_t)p;
}
{
char b = 4;
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
return 0;
}
|
This program prints out some text. The result should differ depending
on the value of the expression p == q
in line 24.
Start the value analysis with the following command:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c
and launch the GUI (as explained in the Getting Started section).
After selecting the Properties widget and the Only alarms button, you will notice that there is one alarm and some lines of dead code right after the alarm:
The assertion generated shows that the analyzer has found a dangling pointer (a pointer not pointing to a valid object):
/*@ assert Value: dangling_pointer: ¬\dangling(&p); */
Let’s use the analyzer to follow the variable p
through the
execution, and try to find the source of the problem:
The values of the variable p
before and after the selected
statement will be listed in the Values widget in the bottom
panel. We can see that, after the initialization p = &a
, the
variable p
holds the address of a
as expected.
Selecting the expression p == q
inside the if
will show the
value of p
at this point in the program. Before the evaluation of
the p == q
expression p
is shown as ESCAPINGADDR
, meaning
that it does not hold the address of a valid object anymore.
The reason is quite simple: p
holds the address of the local
variable a
, whose lifetime is limited to the block in which the
variable is defined:
{
char a = 3;
p = &a;
pv = (uintptr_t) p;
}
Note
As stated by clause 6.2.4:2 of the C11 standard,
If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
The pointer p
refers to the variable a
outside of its
lifetime, resulting in undefined behavior. As every execution path
results in an undefined behavior, the analysis stops.
When the behavior of a program is undefined, it means that anything can happen at execution time. Some programmers seem to think that, at least in some cases, this is not a big issue: they are certainly wrong. A program invoking undefined behavior is a time bomb.
A dangling pointer is an example of undefined behavior. Let’s illustrate its consequences on our example.
First we compile the program with gcc
:
$ gcc -Wall -Wextra -Wpedantic /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c -o example3
and notice that, despite all of our efforts, gcc
does not issue any warnings.
Running the code will display the following text:
$./example3
Roses are red,
Violets are blue,
This poem is lame,
It doesn't even rhyme.
meaning that the condition p == q
evaluated to true. This can
happen if the variables a
and b
were allocated at the same
address by the compiler (which is possible since they are never in
scope at the same time).
Using different compilation options should not affect the behavior of
the program, but compiling with a -O2
switch and running the
program results in a different output:
$ gcc -Wall -Wextra -Wpedantic -O2 /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c -o example3_opt
$./example3_opt
Roses are red,
Violets are blue,
0x7ffc9224f27e is different from 0x7ffc9224f27f
7ffc9224f27e is not the same as 7ffc9224f27f
This time the expression p == q
evaluated to false because the
variables a
and b
were allocated to different addresses. So
changing the optimization level changed the behavior of our program.
In the aforementioned post you can
see evidence of a third, very weird behavior, in which p == q
evaluates to false, even though a
and b
are allocated to the
same address.
The conclusion is clear, and we will state it as a general warning:
Warning
If the behavior of your program is undefined, executing the compiled code will have unpredictable results and will very likely cause runtime errors. You should always ensure that your code is well-defined, using source code analysis or other techniques.
We have already seen how TrustInSoft Analyzer is able to find dangling pointers and other weaknesses. We will continue the analysis and correct the code in order to guarantee that there are no problems left.
To avoid the dangling pointer problem, we will define a
outside
the block, so that its storage and address are guaranteed by the
standard throughout the whole main
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | //TrustInSoft Analyzer Tutorial - Example 3_1
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
char a = 3;
{
p = &a;
pv = (uintptr_t)p;
}
{
char b = 4;
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
}
|
and launch the value analysis again:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3_1.c
We notice that the dangling pointer alarm regarding p
has been
replaced by the same alarm about q
. When evaluating the expression
p == q
the analyzer noticed a problem with the term p
and
stopped the analysis, so it did not get a chance to issue the alarm
about q
. Now that we corrected the first problem, the analyzer
gets to the term q
and raises the same kind of alarm.
Selecting the variables p
and q
in the Values widget tab
and clicking on the p == q
expression will show that, before
evaluation, p
holds the address of a
whereas q
has the
value ESCAPINGADDR
. This means that q
is a dangling pointer,
as it references the address of the out-of-scope variable b
:
To correct the problem we will simply define b
outside the block,
as we did before with a
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | //TrustInSoft Analyzer Tutorial - Example 3_ok
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
char *p, *q;
uintptr_t pv, qv;
char a = 3;
char b = 4;
{
p = &a;
pv = (uintptr_t)p;
}
{
q = &b;
qv = (uintptr_t)q;
}
printf("Roses are red,\nViolets are blue,\n");
if (p == q)
printf("This poem is lame,\nIt doesn't even rhyme.\n");
else {
printf("%p is different from %p\n", (void *)p, (void *)q);
printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
}
}
|
and launch the analyzer one more time to verify that there are no other alarms:
$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3_ok.c
Notice that the
printf("This poem is lame,\nIt doesn\'t even rhyme.\n");
code is marked as dead. Indeed the variables a
and b
need to
be allocated to different addresses, as they are in scope at the same
time. As a consequence, the condition p == q
evaluates always to
false, so the first branch of the if
statement will never be
executed.
Congratulations! You have successfully corrected all the bugs in
example3.c
. The program is now guaranteed not to be subject to
any of the weaknesses covered by TrustInSoft Analyzer.
In this example we will analyze an implementation of Skein, a cryptographic hash algorithm that was a finalist in the NIST SHA-3 competition.
We will show how to use TrustInSoft Analyzer to:
Tutorial Overview
In the first part of this tutorial we will use TrustInSoft Analyzer to explore the code and then launch a value analysis on the Skein implementation.
The estimated time for this lesson is less than 20 minutes.
Unlike our previous examples, the Skein implementation is composed of
multiple files. All the files needed for this tutorial are located in
the /home/tis/1.45.1/C_Examples/skein_verification
directory.
Let’ start by listing all the files in the directory:
$ ls -l /home/tis/1.45.1/C_Examples/skein_verification
total 124
-rw-rw-r-- 1 tis tis 204 Oct 5 18:54 README
-rwxrwxr-x 1 tis tis 4984 Oct 5 18:54 SHA3api_ref.c
-rwxrwxr-x 1 tis tis 2001 Oct 5 18:54 SHA3api_ref.h
-rwxrwxr-x 1 tis tis 6141 Oct 5 18:54 brg_endian.h
-rwxrwxr-x 1 tis tis 6921 Oct 5 18:54 brg_types.h
-rw-rw-r-- 1 tis tis 524 Oct 5 18:54 main.c
-rwxrwxr-x 1 tis tis 34990 Oct 5 18:54 skein.c
-rwxrwxr-x 1 tis tis 16290 Oct 5 18:54 skein.h
-rwxrwxr-x 1 tis tis 18548 Oct 5 18:54 skein_block.c
-rwxrwxr-x 1 tis tis 7807 Oct 5 18:54 skein_debug.c
-rwxrwxr-x 1 tis tis 2646 Oct 5 18:54 skein_debug.h
-rwxrwxr-x 1 tis tis 1688 Oct 5 18:54 skein_port.h
Note
Please note that the README and the main.c
files are not part
of the Skein implementation. They were added for use with this
tutorial.
The skein.h
file is a good starting point to explore the API. We
will focus our attention on the following lines of code:
typedef struct
{
size_t hashBitLen; /* size of hash result, in bits */
size_t bCnt; /* current byte count in buffer b[] */
u64b_t T[SKEIN_MODIFIER_WORDS]; /* tweak words: T[0]=byte cnt, T[1]=flags */
} Skein_Ctxt_Hdr_t;
typedef struct /* 256-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN_256_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN_256_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein_256_Ctxt_t;
typedef struct /* 512-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN_512_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN_512_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein_512_Ctxt_t;
typedef struct /* 1024-bit Skein hash context structure */
{
Skein_Ctxt_Hdr_t h; /* common header context variables */
u64b_t X[SKEIN1024_STATE_WORDS]; /* chaining variables */
u08b_t b[SKEIN1024_BLOCK_BYTES]; /* partial block buffer (8-byte aligned) */
} Skein1024_Ctxt_t;
/* Skein APIs for (incremental) "straight hashing" */
int Skein_256_Init (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
int Skein_512_Init (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
int Skein1024_Init (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
int Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
int Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
It seems that, in order to hash a message, we’ll need to:
Skein_256_Ctxt_t
;Skein_256_Init
;Skein_256_Update
a representation of the string;Skein_256_Final
with the address of a buffer in order to write the hash value.When confronted with the analysis of an application, the usual entry
point for the analysis is its main
function. Nonetheless, there
are many contexts in which there will not be an obvious entry-point,
for example when dealing with a library. In those cases, you will need
to write a driver to test the code, or better, leverage existing tests
in order to exercise the code.
As the Skein implementation includes no tests, we provide the file
main.c
as a test driver. It implements the steps outlined above in
order to hash the message "People of Earth, your attention,
please"
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | /* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"
#define HASHLEN (8)
u08b_t msg[80] = "People of Earth, your attention, please";
int main(void)
{
u08b_t hash[HASHLEN];
int i;
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&skein_context, HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
return 0;
}
|
The driver also prints the contents of the hashed message in the
for
loop at the end of the file. Each of the printed values
corresponds to the numerical value of the corresponding character in
the hashed message.
We compile and run the code by executing the command:
$ gcc /home/tis/1.45.1/C_Examples/skein_verification/*.c && ./a.out
and get the following output:
215
78
61
246
0
0
0
0
Note
You might get different output because, as we will see, there is a bug in the test file.
We will start by launching a simple value analysis with the following command:
$ tis-analyzer-gui -64 -val /home/tis/1.45.1/C_Examples/skein_verification/*.c
Note
The -64 option specifies that we are using a 64 bit architecture. Other possible values are -16 and -32. If no value is specified, the default target architecture will be 32 bit.
Warning
For the following analysis it is extremely important to provide the correct architecture. Please make sure you are providing the correct value.
Next, we will launch the GUI by opening the link or pointing the browser to http://localhost:8080 (the port number may change if you have other analysis running at the same time).
Once you have opened the GUI, go to the Properties widget and click on the only alarms button:
In the Properties widget we can see that there are about 20 alarms.
Note
If the analyzer finds that an error occurs for every possible execution path, then it will stop the analysis. If a problem occurs in one possible execution path, the analyzer will raise an alarm and continue the analysis on the other execution paths. This is why you can get multiple alarms.
As the analyzer is not precise enough, we will start by improving the precision of the analysis in order to reduce the number alarms. We will show how to do this in the next part of this tutorial.
Please click here to continue the tutorial.
In this second part of the Skein tutorial, we will fine-tune the value analysis and investigate the alarms raised by the analyzer in order to identify and correct bugs.
The estimated time for this lesson is 20 minutes.
Note
As we have seen in previous examples, the number of different
states that can be propagated at each program point is limited by
the semantic unrolling level, which can be tuned with the
-slevel
option. The default configuration will allow only one
single state, so the analysis can become imprecise
quickly. Activating the -slevel
option will allow to unroll
loops and separately propagate the branches of conditional
statements, thus improving the precision of the analysis in most
cases.
A first step to improve the precision of the analysis is to activate
the -slevel
option. We will try to launch the analysis with an
slevel
value of 100 and see if there are any improvements:
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c
Tip
100
is a good default value for the slevel
because most
functions terminate with less than 100 disjoint states. If the
slevel
is not enough, you will be able to fine-tune it later, but keep in mind that increasing the
slevel
can lead to slower analysis, so you will need to find
a good compromise between precision and speed.
The GUI will give you information about the slevel
consumed at different points in the program, making it easier to
find the right value to fit your needs.
We open the GUI again
and notice that there is only one alarm left, so all the other alarms were false alarms.
Note
Unlike many other static analysis tools, TrustInSoft Analyzer never remains silent when there is a potential risk for a runtime error. As we have seen before, when the analysis is not precise enough it can produce false alarms.
On the other hand, the absence of alarms means that you have mathematical guarantees over the absence of all the flaws covered by TrustInSoft Analyzer.
With only one alarm left, we can investigate its cause to see if it is a true positive or not.
Notice that there is some dead code that cannot be reached by any execution. In particular, this function never terminates because the return statement cannot be reached.
This means that the execution never leaves the for
loop in line 19:
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
Clicking on the alarm in the Properties widget, we see that the alarm occurs indeed inside the loop. The annotation generated by the analyzer is:
/*@ assert Value: initialization: \initialized(&hash[i]); */
meaning that we are trying to read a value that might not be properly initialized.
Note
The for
loop in the main.c
program has been transformed in
a while
loop by the analyzer. This is part of a normalization
process that transforms the program into a semantically equivalent
one in order to facilitate the analysis.
Let’s explore the values inside the hash
array by going to the
Values widget and clicking on the hash
term in the
Interactive Code Window.
By clicking on the values in the before column of the Values widget,
you will have a more readable output of the contents of the hash
array:
[0] ∈ {215}
[1..7] ∈ UNINITIALIZED
This begins like: "�"
The analyzer is telling us that, before entering the loop, hash[0]
contains the value 215
(which it tries to interpret as
a character, which gives the This begins like: "�")
and all the
other items in the array are UNINITIALIZED
.
Reading an uninitialized value is an undefined behavior, and that is exactly what the program is doing when trying to read past the first element of the array. The analysis stops here as all the execution paths lead to undefined behavior.
It seems that the alarm is indeed a true positive, so we cannot go further in the analysis without correcting the problem.
As we have uninitialized values in the array, let’s have a look at the
initialization function Skein_256_Init
.
You can navigate to the definition of the function by right clicking on the function name:
This will update the Interactive Code Window with the code of the function:
After clicking on the hashBitLen
variable, we can see in the
Properties widget that its value is always 8
.
This value corresponds to the value of the HASHLEN
macro that is
passed to Skein_256_Init
function in line 15 of the original code:
Skein_256_Init(&skein_context, HASHLEN);
Note that in the Interactive Code Window this macro has been expanded to 8
.
So the length of the hash is expressed in bits, and we were inadvertently asking for an 8 bit hash. This explains why only the first element of the array was initialized.
As we wanted an 8 byte hash, we can correct the problem by
multiplying the value by 8 in the call to Skein_256_Init
,
modifying line 15 of the main.c
file to look like this:
Skein_256_Init(&skein_context, 8 * HASHLEN);
After saving the changes we run TrustInSoft Analyzer again:
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c
and open the GUI:
Congratulations! There are no more alarms, so our program is guaranteed to be free from all the kinds of bugs covered by TrustInSoft Analyzer.
We can see the value of the hash in the shell in which we executed TrustInSoft Analyzer:
Let’s compile and run the program again to check that the results are the same:
$ gcc /home/tis/1.45.1/C_Examples/skein_verification/*.c && ./a.out
We get the following output, which is indeed the same that we get on the analyzer
224
56
146
251
183
62
26
48
Note
The results we got when compiling the program in the first part of this tutorial were different because we were compiling an ill-defined program (as it was reading uninitialized variables, which is an undefined behavior). When the behavior of a program is undefined, you can get anything as a result (in our case it was most probably garbage read from the memory, as we were accessing uninitialized values).
As the program is now guaranteed to be correct, the output should be the same on any standard-compliant installation.
Our driver tested the Skein implementation on a single message of length 80. We could modify it by testing a number of different messages, and thus gain better coverage, as normal unit tests will do. But what if we could test it on all the messages of length 80?
This is certainly impossible to achieve by the execution of any compiled test, but it is in the scope of TrustInSoft Analyzer.
In the next part of this tutorial we will show you how to generalize the test to arbitrary messages of fixed length. This will give mathematical guarantees about the behavior of the implementation on any message of the fixed length.
Please click here to continue the tutorial.
The previous analysis
allowed us to find a bug on the test driver. After correcting the
program, the analyzer did not raise any alarms, which guaranteed the
absence of a large class of bugs when hashing the 80
char message “People of Earth, your attention, please
”.
In this third part of the Skein tutorial, we will generalize the result obtained in the second part to all the messages of length 80.
The estimated time for this lesson is 20 minutes.
Testing a program on a given set of test cases can allow you to identify some bugs, but it can not guarantee their absence. This is true even if you test your program on a huge amount of cases, because it is very unlikely that you will be able to cover them all.
Even in a simple example like ours (hashing a message of length 80), there are way too many test cases to consider (see the note below to get some detailed calculations).
Furthermore, even if your cases pass the test, this gives you no guarantee about future behavior: for example, if a test case results in an undefined behavior this might pass unnoticed in the test environment but trigger a bug in your client’s machine.
Note
Although the number of distinct 80 chars arrays is certainly
finite, there are 28 = 256 possible values for a char
variable so there are 25680 ≈ 4.6 x 10192
different arrays of 80 chars. As the number of elementary
particles in the visible universe is considered to be less than
1097, we can conclude that the number of arrays of 80
chars is definitely out of scope for an exhaustive test.
The value analysis performed by TrustInSoft Analyzer uses abstract interpretation techniques in order to represent the values of the terms in the program. This allows for the representation of very large or even infinite sets of values in a way that makes sense to the analyzer. That is why we are going to be able to consider all the possible values for a given variable.
If the value analysis raises no alarms, this means that you have mathematical guarantees about the correctness of the program. These guarantees are not limited to one particular execution, but extend to any execution in a standard-compliant environment (which respects all the hypothesis made during the analysis).
Note
We will use some primitives that have a meaning in the abstract state of the analyzer, but that make no sense for a compiler. For this reason, we would not be able to compile or execute the driver anymore.
One such function is tis_interval
. The assignment x =
tis_interval(a, b)
tells the analyzer that the variable x
can
hold any value between a
and b
(both included).
Our goal is to obtain a result that is valid for any given array of
length 80. Let’s start by modifying the main.c
program in order to
tell the analyzer that each of the elements in the msg
array can
take any value between 0
and 255
.
To accomplish this, it suffices to add a couple of lines to the main.c
program
for(i = 0; i < 80; i++)
msg[i] = tis_interval(0, 255);
to make it look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"
#define HASHLEN (8)
u08b_t msg[80];
int main(void)
{
u08b_t hash[HASHLEN];
int i;
for(i = 0; i < 80; i++)
msg[i] = tis_interval(0, 255);
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&skein_context, 8 * HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
for (i = 0; i < HASHLEN; i++)
printf("%d\n", hash[i]);
return 0;
}
|
We then launch the value analysis again
$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c
and open the GUI:
There are no alarms in the GUI. This means that we have mathematical guarantees that the code is free from all the bugs covered by TrustInSoft Analyzer.
Congratulations! You have just proved that initializing, parsing, and hashing a message of length 80 by calling the three functions
Skein_256_Init(&skein_context, 8 * HASHLEN);
Skein_256_Update(&skein_context, msg, 80);
Skein_256_Final(&skein_context, hash);
will cause no runtime errors, whatever the contents of the 80 char array msg
.
Note
As we saw before, this kind of result would be impossible to obtain with case testing, as this would amount to write a program testing each one of the 25680 possible input messages.
We will end this part of the tutorial with a few words on the representation of values by the analyzer. At each program point, the analyzer determines a variation domain for each variable or expression. The value of the variable or expression is always guaranteed to be contained in the variation domain but, due to the possible loss of precision, these domains can be over-approximations.
Note
The analyzer can represent the variation domain of an integer variable in three different ways:
[l..u]
with lower bound l
and upper bound u
.[l..u], r%m
, that
is the set of values between the lower bound l
and the upper
bound u
, whose remainder in the Euclidean division by m
is equal to r
.A --
represents the smallest value that fits within the type of
a variable if it occurs in the lower bound and the biggest value if
it occurs in the upper bound. For instance, [--..--]
will mean
[0..255]
for a variable of type unsigned char
.
If you go to the Values widget in the GUI and click on the
msg
variable in the Interactive Code Window (as shown below),
you will be able to see how the analyzer represents integer values
contained in the array msg
of the program main.c
:
The analyzer shows that, after the initialization loop, msg[0..79] ∈
[--..--]
. This means that each of the values msg[0]
to
msg[79]
can take an arbitrary value in the interval
[--..--]
. As pointed before, for an unsigned char [--..--]
stands for any value that fits within the type, that is between
0 and 255.
This is exactly what was achieved by calling the function tis_interval(0, 255)
.
If the C++ front-end for TrustInSoft Analyzer is available,
it can be used with the command tis-analyzer++
. This command accepts the
same arguments as tis-analyzer
as well as specific options that
can be listed using the command tis-analyzer++ -cxx-h
. An
interpreter version is also available using the command
tis-analyzer++ --interpreter
.
This tutorial will show you some C++ specific features of
TrustInSoft Analyzer in order to better understand its output and help
tracking the origin of alarms. It should be read after the
Getting Started section. The examples used in this tutorial can be
found in the /home/tis/1.45.1/Cxx_Examples
directory.
We will start by analyzing the program matrix.cpp
, which is a test
over a matrix manipulation library:
int
main(void) {
Matrix<2U, 2U> matrix_a {
2., 1.,
4., 2. };
auto id = identity<2>();
bool has_inverse = is_invertible(id);
std::cout << "identity is inversible: " << (has_inverse ? "yes\n" : "no\n");
Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
Matrix<2, 1> res = solve(matrix_b, { 6., 10. });
std::cout << "RESULT IS:\n" << res;
return 0;
(void) has_inverse;
}
Start the analysis with the following command:
$ tis-analyzer++ --interpreter -gui /home/tis/1.45.1/Cxx_Examples
/matrix.cpp
and open the GUI. The Interactive Code Window should look like:
The first thing to notice is that some names contain characters that
are forbidden in C, like Matrix<2, 2>
or std::__1::ostream
,
and may be prefixed by ...
. The names of entities in tis-analyzer++
are
actually mangled, the Interactive Code Window displays an
unmangled version of them to be clearer. The mangled version of names
can be viewed by using the option -cxx-keep-mangling
, and the
mangling used is close enough to existing compiler practice to be
unmangled by external tools like c++filt
.
When a name is long, a shortened version of it is displayed in the
Interactive Code Window with ...
as prefix. Clicking on this
prefix will display the full qualified name, or its mangled version if
the option -cxx-keep-mangling
is used.
The first statement that is not a declaration is a call to the
function __tis_globinit()
. This function represents the dynamic
initialization phase of a C++ program [1]. It contains
only calls to functions with names similar to X::Y::__tis_init_Z
,
that are used to initialize the non-local variables X::Y::Z
.
Looking at the definition of the X::Y::__tis_init_Z
function will
lead the Source Code Window to the body of the generated
function initializing the variable X::Y::Z
The first statement of the main
function in the original code is:
Matrix<2U, 2U> matrix_a {
2., 1.,
4., 2. };
and corresponds in the normalized code to the line:
Matrix<2, 2>::Ctor<double, double, double, double>(& matrix_a,2.,1.,4.,2.);
Ctor
is the generic name that TrustInSoft Analyzer assigns to C++
constructors. You can see that:
& matrix_a
.All method calls are translated to regular C function calls, and as
such they receive an additional argument which stands for the this
pointer. In case of constructors, this is the address of the object
being initialized.
When looking at the constructor definition, you can see that it is
calling the inherited constructor Matrix_base<2, 2>::Ctor<double,
double, double, double>
with the same arguments, except that the
this
pointer is shifted to its __parent__Matrix_base<2, 2,
Matrix>
. The corresponding part of the original code is:
const Matrix_base<I, J, Parent> &m1,
const Matrix_base<J, K, Parent> &m2)
{
auto cross_product =
[&m1, &m2] (unsigned i, unsigned j) -> double
{
Matrix<N, M>
inherits from Matrix_base<N, M, Matrix>
, and its
constructor only transfers its arguments to the constructor of the
parent class. In tis-analyzer++
, a class A
inheriting from a class
B
is represented by a struct A
containing a field struct B
__parent__B
. The initialization of the base B
of A
is
translated into a call the function B::Ctor(&A.__parent__B)
. This
structure layout can be observed in the example by looking at the
definition of the type struct Matrix<2, 2>
.
The next statement of the main
function in the original code is:
auto id = identity<2>();
and corresponds in the normalized code to the line
identity<2>(& id);
The first thing to note here is that the id
variable has an
auto
type in the original source but is declared in the normalized
code as:
struct Matrix<2, 2> id;
tis-analyzer++
makes auto
types explicit, in the same way it
instantiates template parameters.
Another difference is that in the normalized code the identity<2>
function takes an additional argument despite being a usual function
and not a method. This is a consequence of the fact that, in C++, a
non-POD [2] value returned by a function may not live inside the
function but inside its caller. To model this, a function returning a
non-POD type receives an additional parameter which contains the
address of the initialized object.
The next statement of the main
function in the original code is:
bool has_inverse = is_invertible(id);
which, when clicked on, corresponds in the normalized code to:
{
struct Matrix<2, 2> __tis_arg;
_Bool tmp;
{
{
Matrix<2, 2>::Ctor(& __tis_arg,(struct Matrix<2, 2> const *)(& id));
tmp = is_invertible<2>(& __tis_arg);
}
has_inverse = tmp;
}
}
In this case, one C++ statement is translated into a block containing
multiple declarations and statements. The function
is_invertible<2>
takes its argument by copy, as seen in its
declaration:
template<unsigned N>
bool
is_invertible(Matrix <N, N> m)
and so its parameter has to be initialized with a new object. This is
the purpose of the __tis_arg_*
family variables. In the current
case, __tis_arg
is initialized by calling the copy constructor of
Matrix<2, 2>
with the address of id
as source. Then, the
address of newly built __tis_arg
variable is given to the function
is_invertible<2>
and the block around it delimits the lifetime
of __tis_arg
. This is the semantics of passing arguments by copy
[3].
This transformation does not happen when calling the copy constructor
of Matrix<2, 2>
because its argument is a reference. References
are converted to pointers, so taking a reference to an object is
taking its address, and accessing a reference simply dereferences the
pointer.
The next interesting statement is the one at line 37 of the original source:
Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
which is translated in the normalized code as:
{
struct Matrix<2, 2> __tis_tmp_61;
{
{
operator^(& __tis_tmp_61,(double)5,
(struct Matrix<2, 2> const *)(& id));
}
;
;
}
operator+(& matrix_b,
(struct Matrix_base<2, 2, Matrix> const *)(& matrix_a.__parent__Matrix_base<2, 2, Matrix>),
(struct Matrix_base<2, 2, Matrix> const *)(& __tis_tmp_61.__parent__Matrix_base<2, 2, Matrix>));
}
Again, this statement is decomposed into a block containing multiple
statements but declaring this time a variable called __tis_tmp_62
.
The __tis_tmp_*
family of variables correspond to temporary object
[4] that can be introduced by complex expressions. This
temporary object is declared inside the block as its lifetime is the
one of the full expression, and has to be destroyed at its end if
needed.
[1] | as stated in [basic.start.init]. |
[2] | POD in the sense of [class]p10. |
[3] | See [expr.call]p4 and in particular: “The initialization and destruction of each parameter occurs within the context of the calling function.” |
[4] | as defined in [class.temporary]. |
This section describes how to use TrustInSoft tools.
main
function to be used as an entry point of the analysis when the
original main
function uses command line arguments."files"
list of a configuration file.The aim of this document is to help the user to study a C or C++ application with TrustInSoft Analyzer.
The main goal of this kind of study is to verify that none of the detected software weaknesses listed in the CWE-subset are present in the application. This document explains step by step how to achieve that goal.
The definitions of some of the words and expressions used in this document are given in the Glossary.
The main steps of an analysis are:
Here and there, extracting information, either to understand the results or to produce a report, is also useful. This can be done by combining options and scripts. How to do it is also explained in Get the Information.
Although the tool can be used purely from the command line interface, it also provides a GUI (see the TIS GUI Manual) that is very convenient for exploring the computed results.
The purpose of this chapter is to explain how to prepare the source code of an application before starting to analyze it. The main steps to perform are:
Finding out the preprocessing options.
This step can either be manual (Manual Preparation) or automatic (Automatic Preparation).
The manual preparation is the easiest way to start with if you already know the commands necessary to compile the source files. Otherwise, start instead with the automatic preparation.
Dealing with the external libraries.
In the simplest cases, all the source files need the same preprocessing command.
The default preprocessing command of tis-analyzer
is:
clang -C -E -nostdinc -isystem TIS_KERNEL_SHARE/libc
Some more options can be added to this command
with the -cpp-extra-args
option.
The whole command can also be specified directly with the -cpp-command
option, for instance in order to use another preprocessor.
The -I
and -isystem
(to add include paths), -D
(to add
macro definitions), and -U
(to remove macro definitions) options
are provided as shortcuts to the -cpp-extra-args
option.
For example, the following command can be used to run the analyzer on the
f1.c
, f2.c
, and f3.c
source files,
taking the included files from the incl_dir
directory:
$ tis-analyzer -I incl_dir f1.c f2.c f3.c
A specific preprocessing command can be given to a set of specific
files with the option -cpp-command-file "f1.c:clang -C
-E,f2.c:gcc -C -E"
(or -cpp-command-file "f1.c:clang -C
-E" -cpp-command-file "f2.c:gcc -C -E"
. More options can be added to
a preprocessing command for a set of files in the same way with the option
-cpp-extra-args-file "f1.c:-Idir/"
.
Any file not listed in -cpp-command-file
(resp. -cpp-extra-args-file
) will use the global command
(resp. additional options) of the -cpp-command
option
(resp. -cpp-extra-args
option).
If most of the source files need to have a specific preprocessing command, it is recommended to use the Automatic Preparation.
The exact pre-processing command in use can be shown by adding the command line
option -kernel-msg-key pp
when running the analyzer.
In some applications, the source code is split in modules that require different preprocessing commands.
Warning
First of all, an important recommendation is to tackle the software in as small-sized chunks as possible. This makes the most of pre-processing problems go away.
If a particular source file needs a different preprocessing command, it is
better to preprocess it first. The result file has to be named with a .i
or .ci
extension so the analyzer knows that it does not need to preprocess it.
The difference between the two extensions is that the .i
files are not
preprocessed at all by the tool, whether the macro definitions are expanded in
the annotations of the .ci
files, which is most of the time the intended
behavior. So except in some special cases, the .ci
extension has to be
preferred.
Source files and preprocessed files can be mixed in the command line. For
instance, if the f3.c
file needs some special options, f3.ci
can be
generated beforehand, and then used in the command line:
$ tis-analyzer -I incl_dir f1.c f2.c f3.ci
This will give the same result as the previous command, provided that
f3.c
has already been preprocessed into f3.ci
.
Here is a synthetic example with two files h1.c
and h2.c
that use the same macro M
which needs to have a different definition in each
file.
h1.c
:¶ int x = M;
extern int y;
int main(void) {
return x + y;
}
h2.c
:¶ int y = M;
If M
is supposed to be 1 in h1.c
and 2 in h2.c
the recommended command lines for this example are:
$ clang -C -E -nostdinc -DM=1 -o h1.tis.ci h1.c
$ clang -C -E -nostdinc -DM=2 -o h2.tis.ci h2.c
Then, the generated files can be provided to the analyzer:
$ tis-analyzer -val h1.tis.ci h2.tis.ci
And the obtained result shows that M
has been correctly expanded:
...
[value] Values at end of function main:
__retres ∈ {3}
...
In more complex cases, it is better to use the Automatic Preparation.
Most applications use some libraries, at least the standard libc
. The
analyzer needs to have information about the functions that are used by the
application, at least the ones that are called in the part of it which is being
studied.
For the libc
library, some header files come with the tool and provide
specifications to many of its functions. These header files are included by
default when preprocessing source files. However, if the preprocessing is done
before, the following option has to be employed in order to find the
instrumented files:
-I$(tis-analyzer -print-share-path)/libc
The tool also provides implementations to some libc functions
that are automatically loaded.
They are either C source code or internal built-in functions.
But the -no-tis-libc
option may be used to completely ignore the tool’s
library functions and header files.
It can be useful when analyzing code with custom libc functions for instance.
Another intermediate solution is to use the --custom-libc <file>
option.
In that case, the given source file is analyzed before the tool runtime files.
It gives the opportunity to overload some of the provided C implementations.
The built-in functions cannot be individually be overloaded at the moment.
To overload some header files in case something is missing,
the --custom-isystem <path>
option can be used.
Then the given include path is used before the tool ones.
In that case, the custom headers xxx.h
may include the tool headers with:
<path>/xxx.h
:¶ #include <tis-kernel/libc/xxx.h>
// some more declarations and/or specification for <xxx.h>
If other external functions are used, one has to provide some properties concerning each of them: at the minimum to specify which pieces of data can be modified by them. See Check the External Functions to know which functions have to be specified and Write a Specification to learn how to do it.
At this point, the source files and the preprocessing commands should have been retrieved. It is time to try the tool for the first time, for instance by running:
tis-analyzer -metrics <..source and preprocessed files..> <..preprocessing options>
The preprocessing options are only used when source files are provided. In complex cases, it can be easier to analyze only the already preprocessed files.
This section describes how to automatically produce
a compile_commands.json
file that contains instructions on how to
replay the compilation process independently of the build system.
A compilation database is a JSON file, which consist of an array of “command objects”, where each command object specifies one way a translation unit is compiled in the project.
Each command object contains the translation unit’s main file, the working directory where the compiler ran and the actual compile command.
See the online documentation for more information:
compile_commands.json
¶CMake
(since 2.8.5) supports generation of compilation databases
for Unix Makefile builds with the option
CMAKE_EXPORT_COMPILE_COMMANDS
.
Usage:
cmake <options> -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <path-to-source>
For projects on Linux, there is an alternative to intercept compiler
calls with a more generic tool called bear
.
Usage:
bear <compilation_command>
Note: Starting with Ubuntu 22.04, you must use bear --
<compilation_command>
. The double dash (--
) indicates the end of the
options specific to bear
and that all that follows is part of the actual
build command.
Tip
It is recommended to use bear
. It can be installed with the
packet manager, typically:
sudo apt install bear
compile_commands.json
¶In order to use the produced compilation database, run TrustInSoft Analyzer with the following command:
tis-analyzer -compilation-database path/to/compile_commands.json ...
Also, if a directory is given to the -compilation-database
option,
it will scan and use every compile_commands.json
file located in
the given directory and its sub-directories.
tis-analyzer -compilation-database path/to/project ...
It is also possible to use compilation databases in a tis.config
file for the analysis.
A possible generic template for the tis.config
file is given below
(see Configuration files for more information about
tis.config
files).
{
"compilation_database":
[
"path/to/compile_commands.json"
],
"files":
[
"path/to/file_1",
"path/to/file_2",
"path/to/file_N"
],
"machdep": "gcc_x86_64",
"main": "main",
"val": true,
"slevel-function":
{
"function_name": 10
}
}
To use the tis.config
file, run TrustInSoft Analyzer with the
following command:
tis-analyzer -tis-config-load tis.config
Note
The tis.config
file uses a strict syntax for JSON. A typical
mistake would be to put a comma for the last line of an object,
e.g. for the line "path/to/file_N"
, and it would lead to an
error.
At this point, whatever method was chosen for the preparation step, you should, for instance, be able to execute:
tis-analyzer -metrics <... arguments...>
with the appropriate arguments, the analyzer should run with no errors. Using
the command tis-analyzer-gui
with the same arguments starts the GUI which
lets you browse through the source code, but not see the analysis results yet,
since nothing has been computed at the moment.
It is often useful to save the results of an analysis with:
tis-analyzer ... -save project.state > project.log
This command puts all the messages in the file project.log
and saves the
state of the project itself to the file project.state
, so that it can be
loaded later on. For instance, we can load it now in the GUI by executing:
tis-analyzer-gui -load project.state
In case the application includes some special features (assembler code, etc.) and/or requires to be studied for a specific hardware target and/or with specific compiler options, please refer to Dealing with Special Features.
This chapter explains how to specify which part of the source code of an application will be studied and in which context. Moreover, it also shows how the overall goal can be split into several separate analyses if needed. The main objective is to be able the run the value analysis, implemented by the Value plug-in, in order to obtain the alarms concerning the software weaknesses listed in the CWE-subset.
The study perimeter could be the whole program, or only some functions of a library, or a single use case scenario. Explaining how to decide which part of the source code should be studied is very difficult, since it depends a lot on the particular application, the amount of time available, and mostly on how one looks at the problem… Adopt an incremental approach: begin with a small study, in order to understand how to apply the tools in the given situation, and then enlarge the perimeter later on.
In order to run a value analysis, an entry point to the program has to be
provided. The body of the entry point function defines the studied perimeter. It
is usually the main
function which establishes the context verified by the
analysis, but other functions can be used to this end as well.
main
function of an application can be used
directly. However, if it takes options and arguments, it still has to be
called from an entry point that builds values for them. The tis-mk-main
utility can help in doing so (see tis-mk-main Manual). Be aware
though, that if main
is is a complex function that parses options and
needs many string manipulations, it is probably a better idea to write a
smaller entry point from scratch in order to define a more precise context of
analysis.It is important to mention here the difference between dynamic test execution
and static value analysis. As the code is not executed in the latter, each of
the built inputs provided to the analyzed function does not need to have a
single value. It means that a function taking a single integer parameter x
can, for instance, be analyzed for all the possible input values, or for all the
values from a given set (e.g. 3 < x < 150
). So when we mention “a value”
here, we do not actually mean “a single concrete value”, but rather “a set of
abstract values”.
Basically, the entry point function has to call the functions to analyze, providing them with appropriate input values (i.e. function arguments) that correspond to the studied perimeter. Some builtin-in functions are available to build these input values:
for an integer interval:
x = tis_interval(l, u);
It guarantees that the analyzer will produce warnings for any bad behavior
that could result from any value between l
and u
(inclusive) being
returned. Several other functions are also provided for other types like for
instance tis_double_interval(l, u)
for floating-point values, and
tis_unsigned_long_long_interval(l, u)
for wide integers, which behave the
same way for the types double
and unsigned long long
.
to initialize addr[0 .. len-1]
:
tis_make_unknown (addr, len);
It guarantees that the analyzer will produce warnings for any bad behavior
that could result from having any arbitrary len
bytes in memory
starting from addr
.
The tis_make_unknown
function is also useful to initialize a simple
variable:
tis_make_unknown (&x, sizeof (x));
This is equivalent to x = tis_interval(l,u);
when
l is the minimum value of the type
and u is the maximum value of a type.
for a non-deterministic choice between two integers:
x = tis_nondet (a, b);
It guarantees that the analyzer will produce warnings
for any bad behavior that could result from x
value being a
or b
.
These are only two cases,
but these cases combine with the other possibilities
resulting from the calls to the other builtin functions.
for a non-deterministic choice between two pointers:
p = tis_nondet_ptr (&x, &y);
This one is similar to the previous one, but for pointers.
Example: the main
function below shows a valid entry point to test a
compute
function that takes a buffer, its length, and a pointer to store
a result:
#include <stdio.h>
#include <tis_builtin.h>
int compute (char * buf, size_t len, char * result);
int main (void) {
char buf[100];
tis_make_unknown (buf, 100);
size_t len = tis_interval (0, 100);
char x;
char * result = tis_nondet_ptr (NULL, &x);
int r = compute (buf, len, result);
}
the builtin tis_init_type
can be used to initialize a simple pointer,
such as int * p
, or a pointer to a recursive data structure, such as
struct list * p
. It takes five arguments:
tis_init_type(str_type, ptr, depth, width, valid)
const char * str_type
should be a string
representing a valid type of the memory to initialize.void * ptr
should be a pointer to the memory
area to initialize.unsigned long depth
should be an integer that
exactly mirrors the behavior of the option -context-depth
during the
initialization.unsigned long width
should be an integer that
exactly mirrors the behavior of the option -context-width
during the
initialization.-context-valid-pointers
during the initialization.Example:
#include<tis_builtin.h>
struct list {
int data;
struct list * next;
};
int main(){
int *p0, *p1, *p2;
struct list * p3;
tis_init_type("int *", &p0, 1, 1, 1);
tis_init_type("int *", &p1, 1, 10, 1);
tis_init_type("int *", &p2, 1, 1, 0);
tis_init_type("struct list *", &p3, 3, 1, 1);
tis_dump_each();
}
The code above calls tis_init_type
to initialize pointers
p0
, p1
, p2
and p3
. More specifically:
tis_init_type("int *", &p0, 1, 1, 1)
allocates an array of size 1
given by the width
argument, initialize the array element to any possible
integer: S_p0[0] ∈ [--..--]
, and then assign the array address to pointer p0
:
p0 ∈ {{ &S_p0[0] }}
.tis_init_type("int *", &p1, 1, 10, 1)
allocates an array of size 10: S_p1[0..9] ∈ [--..--]
and assigns its
address to pointer p1
: p1 ∈ {{ &S_p1[0] }}
.tis_init_type("int *", &p2, 1, 1, 0)
sets the last argument to 0,
which allows p2
possibly be a NULL
pointer: p2 ∈ {{ NULL ; &S_p2[0] }}
.tis_init_type("struct list *", &p3, 3, 1, 1)
allocates a list of
length 3 (the list length corresponds to the depth
argument),
and assign the list head address to pointer p3
.p0 ∈ {{ &S_p0[0] }}
S_p0[0] ∈ [--..--]
p1 ∈ {{ &S_p1[0] }}
S_p1[0..9] ∈ [--..--]
p2 ∈ {{ NULL ; &S_p2[0] }}
S_p2[0] ∈ [--..--]
p3 ∈ {{ &S_p3[0] }}
S_p3[0].data ∈ [--..--]
[0].next ∈ {{ &S_next_0_S_p3[0] }}
S_next_0_S_p3[0].data ∈ [--..--]
[0].next ∈ {{ &S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
[0].next ∈ {0}
In order to obtain more details about the available functions which allow building imprecise values, refer to the Abstract values section, or browse the file:
more $(tis-analyzer -print-share-path)/tis_builtin.h
Some tools are also available and may help to build the entry point for specific situations (see tis-mk-main Manual).
Now, when the main entry point is ready, it is time to run the value analysis
for the first time using the -val
option.
An important thing to check is the nature of external functions. More precisely, to look for this message in the log file:
$ grep Neither proj.log
[kernel] warning: Neither code nor specification for function ...
This message indicates that the given function is undefined. In order to progress with the value analysis, it MUST be defined by either:
The libc
library functions should not appear in these messages since most of
them are already specified in provided library files (see About Libraries).
Writing C stubs for functions for which no code is available is the recommended way to go. The standard functions and the builtins presented above (see Write an Entry Point) may be used to abstract the implementation details.
To illustrate how to write stubs using standard functions
and analyzer builtins,
say that the code we want to analyse to find errors in it
is the function main
below,
and we do not have the code for the function mystrdup
.
char *mystrdup(char *s);
int main(void) {
char c, *p;
int x;
p = mystrdup("abc");
if (p)
c = p[0];
x = c - '0';
}
There is currently no good way to write a specification that indicates
that mystrdup
allocates a new block
and makes it contain a 0-terminated string, but instead,
the recommended method is to abstract it with a stub that may look as follows:
#include <string.h>
#include <stdlib.h>
#include <tis_builtin.h>
char *mystrdup(char *s) {
size_t l = strlen(s);
char *p = malloc(l+1);
if (p) {
tis_make_unknown(p, l);
p[l] = 0;
}
return p;
}
The files can be analyzed with:
$ tis-analyzer -val -slevel 10 main.c mystrdup.c
As shown in the trace, the analyzer correctly detects
that the main
function may use c
uninitialized:
tests/val_examples/stub_main.c:13:[kernel] warning: accessing uninitialized left-value: assert \initialized(&c);
tests/val_examples/stub_main.c:13:[kernel] warning: completely indeterminate value in c.
When specifying an external function with the ACSL properties, only the
assigns
properties are mandatory: they give to the tool an
over-approximation of what can be modified. However, providing also the
function’s post-conditions can help the analyzer and yield more precise results
(see Write a Specification).
Performing value analysis with no additional options (like in all the cases above) makes it run with a rather low precision. It should not take too long to get the results that indicate where the alarms were found. When using the GUI, the list of alarms can be selected using the Kind filter of the Properties panel, and a summary of the number of alarms can be found in the Dashboard panel.
The global precision can be changed using the -slevel n
option. The
greater n
is, the more precise the analysis is (see About the Value Analysis for
more details). These alarms which could be formally verified by increasing the
precision in this way will disappear. Those which remain are the difficult part:
they require further attention.
Value analysis takes longer and longer when the precision increases. Thus it can be profitable to fine tune the precision locally on certain functions in order to benefit from the higher precision level where it is advantageous (so that more alarms are formally verified) while keeping it lower where it matters less (so that the analysis runs faster).
For the same reason (fast analysis to find bugs earlier) it can also be useful to reduce (temporarily) the size of the arrays (when the source code is structured to allow this easily).
The final analysis information can be found in the Dashboard panel.
Note that at this point the goal is not to study the alarms precisely, but rather to get a rough idea of the amount of work needed in order to be able to decide which part to study.
The experience suggests that if the size of the analyzed source code is large and / or if there are many alarms, it is usually worthwhile to split the study into smaller, more manageable sub-components. The idea here is to write a precise specification for every sub-component and then analyze each of them independently toward its particular specification. Afterwards, the main component can be studied using those specifications instead of using directly the sub-components’ source code.
It is quite easy to decide which part should be split if some main features are identifiable and clearly match a given function. Otherwise, a first overview of the number of alarms may help to isolate a part that seems difficult for the analyzer. However, as the separated function must be specified, it is much easier if it has a small and clear interface (in order study a function, it must be called in the intended context, and this context might be difficult to build if it corresponds to a large and complex data structure).
To split the analysis one must write:
main
function for the main component,main
function and an ACSL specification for each of the API functions
which is supposed to be studied independently.Then, when performing the analysis for the main component, the -val-use-spec
option should be used in order to provide the list of the API specified
functions. For each of the functions from this list the value analysis will use
the function’s ACSL specifications instead of the function’s body.
For instance, the commands below can be used to split the study into the main
analysis with two sub-components corresponding to the f1
and f2
functions:
$ tis-analyzer -val $SRC main.c -val-use-spec f1,f2 \
-acsl-import f1.acsl,f2.acsl \
-save project.state
$ tis-analyzer -val $SRC main_f1.c -acsl-import f1.acsl -save project_f1.state
$ tis-analyzer -val $SRC main_f2.c -acsl-import f2.acsl -save project_f2.state
In the commands above:
main.c
, main_f1.c
and main_f2.c
should hold the entry
points for the main component, and the f1
and f2
functions
respectively;f1.acsl
and f2.acsl
should hold the ACSL specifications of
the, respectively, f1
and f2
functions (see Write a Specification
to learn how to write a specification).There is another case where studying an entry point may require several separate analyses: when there is a parameter in the program that has to be attributed a value (e.g. using a macro) and when it is difficult to give it an arbitrary value beforehand (e.g. it is a parameter defining the size of an array). In such situations it is better to write a loop in an external script to attribute different values to that parameter and run as many analyses as necessary.
The following script runs an analysis for every N
being a multiple of 8
from 16
to 128
:
#/bin/bash
for N in $(seq 16 8 128) ; do
tis-analyzer -D N=$N -val $SRC main.c
done
Of course, it supposes that N
is used somewhere in main.c
.
Writing a specification for a function is useful in two cases:
When some splitting is done.
A certain function can be studied independently, as a separate sub-component of the analysis. The function is verified toward a certain specification and then that specification can be used (instead of using directly the function’s body when analyzing function calls) in the main component’s verification (To Split or Not to Split).
When using some unspecified external functions.
If an external function, that is not part of the subset of the libc
library functions provided with the tool (which are already specified), is
used in the program, then it needs an explicit specification. The provided
specification has to indicate at least which of the concerned data may be
possibly modified by the function. Pre and postconditions are not mandatory
in that case.
The specification is written in ACSL and is mainly composed of:
requires
properties),assigns
properties),\from
part of assigns
properties),ensures
properties).The ACSL properties can be either inlined directly in the source files or
written in separate files and loaded (as explained in ACSL Properties).
An analysis will use the specification instead of the function’s body to process
a function call when either the body is not provided or an appropriate
-val-use-spec
option has been set in the command line.
When analyzing a function call using the specification, the tool:
In the specification of an external function the preconditions are not mandatory. If some are provided though, the analyzer checks whether they are satisfied at each call. Therefore adding preconditions in that case makes the verification stronger.
When a defined function is analyzed separately from the main application, its preconditions define the context in which it is studied. This does not have to be the most general context imaginable, but it has to include at least all the possible usage contexts that can be found in the application.
For example, suppose that the function f
has been selected to be studied
independently, and that it takes a single parameter x
. If f
is always
called from the application with positive values of x
, it is possible to
study it only in that context. This property must be then specified explicitly
by the precondition:
requires r_x_pos: x > 0;
Also, the corresponding main_f
function - i.e. the one that is written and
used as an entry point in order to study f
individually - must call f
with all the positive values for x
. If the preconditions specify a
context smaller than the context defined implicitly by the main
function, it
will be detected by the analysis since some of the preconditions will be then
invalid. But the opposite case (i.e. if the specified context is larger than the
studied input context) would not be detected automatically.
In other words, in the example above, if main_f
calls f
with (x >=
0)
, it will be detected since (x == 0)
does not satisfy the precondition.
However, if it calls f
only with (0 < x < 10)
, the precondition will be
formally satisfied, but the function behavior for (x >= 10)
will not be
studied. If, for instance, f
is then called in the application with
(x == 20)
, the problem will not be detected since the precondition is valid
for this value.
Warning
When verifying a function toward a specification that is then used to verify another component, it is very important to make sure that the context defined by the specified preconditions and the studied input context represent the same thing.
Note that:
\initialized
precondition should be included in the specification;\valid
(meaning that they point to an allocated memory zone).The assigns
properties are composed of two parts, which specify the modified
data and its dependencies:
//@ assigns <left part> \from <right part>;
Each assigns
property specifies the modified data on the left side
of the \from
keyword.
The union of the left parts of all the assigns
properties in a given
function’s specification is an over-approximation of the data modified by this
function. Hence the data that is not in this set (i.e. the set defined by
the union of their left parts) is expected to have the same value in the
pre-state and the post-state.
The information about the modified data is used:
Each assigns
property specifies the data dependencies on the right
side of the \from
keyword.
The output value of the modified data is expected to depend only on the value of its data dependencies. In other words, if the value of the dependencies is equal in two input states, then the value of the modified data should be equal in the two output states.
There are two kinds of dependencies:
The indirect dependencies have to be explicitly marked with an indirect:
label. All the other dependencies are considered as direct.
Here are some examples of correctly defined dependencies:
//@ assigns \result \from a, b, indirect:c;
int f (int a, int b, int c) { return c ? a : b; }
int t[10];
//@ requires 0 <= i < 10; assigns t[..] \from t[..], a, indirect:i;
void set_array_element (int i, int a) { t[i] = a; }
The dependency information is:
Not used by the WP plug-in.
Very important for many analysis techniques that require knowledge about the data dependencies (such as Show Defs feature in GUI, slicing, etc.), but only when the function body is not used, since if the body is available the dependencies can be computed by the From plug-in.
Employed in the value analysis of the pointers: the output value of the modified pointers can only be among the specified direct dependencies.
Note that an intermediate pointer is needed when a pointer is assigned to the address of a variable. This property is not valid:
assigns p \from &x; // NOT valid.
One must rather declare T * const px = &x;
in the code and then write
the correct property:
assigns p \from px;
It means exactly that the output value of p
may be based on &x
and
on no other existing variables.
Remember that the assigns
properties specify an over-approximation of
the modified data. For instance, the following properties only say that nothing
except B
is modified by the function:
assigns B \from \nothing;
assigns \result \from B;
In order to specify that B
is surely initialized after the function call,
one has to add a post-condition:
ensures \initialized(&B);
When the function result is used to return an error status, it is often the case that the post-condition rather looks like:
ensures (\result == 0 && \initialized(&B)) || (\result < 0);
It is not mandatory to specify ensures
properties, neither in the case of
splitting a defined function nor for specifying an external function. However,
some information about the values in the returned state might be needed in the
analysis of the caller function.
In the specification of an external function, the provided post-conditions cannot be checked since the source code is not available. Hence they are used as hypotheses by the analysis and cannot be formally verified themselves.
Warning
As the post-conditions of external functions cannot be verified by the tool, they must be checked with extra care!
If ensures
properties are specified, it is usually good to keep them as
simple as possible. They have to be verified during the function’s body analysis
and over-specification only increases the amount of work necessary to achieve
that.
Before going any further, it is often advantageous to check the code coverage in order to verify if all the dead code which exists in the application (i.e. the parts of code composed of the functions and branches that are not reachable by the analysis) is indeed intended. To learn how to do that, see Information about the Coverage.
Dead code can be spotted in the GUI by looking for statements with the red background. If some dead code seems strange, it can be explored and investigated using the value analysis results. Clicking on variables and expressions allows to inspect their computed values.
As long as the analysis did not degenerate, the code classified as dead by the tool is a conservative approximation of the actual dead code. It is guaranteed that, in the context defined by the input values, the concerned statements cannot be reached whatever happens. When relying on this guarantee, one should however keep in mind these two important assumptions it depends on: that it applies only if the analysis did not degenerate and that dead code is always considered in the context defined by the input values. This is because most of the time if some code has been marked as dead when it should not have been, the reason is actually that the context of analysis was defined too restrictively (i.e. it does not include all the input values that can happen in the real execution). Another common reason is that the analysis has simply stopped computing a given branch:
At this point, one should have some analyses (one or several) which cover the intended parts of the code and end without any degeneration. The results most likely include alarms. The chapter Study the Alarms explains how to deal with the alarms and, before that, the chapter Get the Information explains how to extract more information from the analysis results.
In case if, due to applying some splitting, there are several analyses, there is no preferred order of the verifications. In any case however, modifying the existing specifications leads to invalidating the results obtained so far.
Caution
The tis-info
tool is only available in the commercial version
of TrustInSoft Analyzer.
This chapter explains how to extract some information from the analyzed project
using the tis-info
plug-in and other external scripts. The tis-info
plug-in provides options to generate textual files containing information about
functions, variables, properties and statements. Filters can be used to extract
specific information from the these files.
Some pieces of information are purely syntactic while some others are of semantic nature. The semantic information is only available if the project which was used to generate the files holds the value analysis results.
For an exact and up-to-date description of each generated piece of information, please refer to the tis-info Manual.
The tis-info
plug-in can be used to generate CSV files. The main options
allow us to extract the information concerning:
-info-csv-functions functions.csv
-info-csv-variables variables.csv
-info-csv-properties properties.csv
-info-csv-statements statements.csv
For instance, in order to get the information about functions from a previously
saved project project.state
, the command line would be:
tis-analyzer -load project.state -info-csv-functions functions.csv
As mentioned before, the kind of obtained information (i.e. either purely syntactic or also semantic) will depend on whether the saved project includes the value analysis results or not.
In the generated CSV files, the information about each element is printed on a
single line (with comma separated fields). Hence, the files can be opened in a
spreadsheet tool for easy selection of elements. Moreover, this format can be
easily grepped (i.e. filtered using the grep
utility), for instance, the
following command returns all the information about the function funname
:
grep funname functions.csv
In order to filter on a specified column, the awk
tool is also very
practical. For instance, the following command returns only the lines where the
word valid
appears in the fifth column:
awk -F, '! ($5~"valid") {print}' properties.cvs
Also, awk
can be used to easily extract only some of the columns:
awk -F, '{print $4 $5}' properties.cvs
The generated file functions.csv
provides information about the functions.
It contains the list of both defined and declared functions appearing in the
analyzed source code, including their locations, whether they are called or not,
are they reachable in the analyzed context, etc. The most useful piece of
information here concerns the coverage and it is detailed just below.
The coverage of each function can be found in the appropriate column of the
functions.csv
file. Note, that this information is semantic of nature and
thus only available if the value analysis results have been computed.
At this point, the internal functions are usually not interesting and they can be filtered out with:
grep -v TIS_KERNEL_SHARE
The easiest approach then might be to check first the completely unreachable functions:
grep ", unreachable,"
And the completely covered ones:
grep -v ", unreachable," | grep -v "100.0%"
Then the GUI can be used to explore the dead code of the functions that are not totally covered in order to verify if this is intended or not.
If the information about the code coverage comes from several separate analyses,
the generated functions.csv
file is not sufficient anymore to measure the
real coverage of functions, since it represents only the results extracted from
only one project out of many. Because of this issue, the tis-aggregate
tool provides a coverage
command to extract all the relevant information
from the functions.csv
files and compile it into overall coverage results
that can be presented in the CSV format:
tis-aggregate coverage project.aggreg > coverage.csv
Here, project.aggreg is a file that gives the base name of the analyses to consider. For instance:
path_1/proj_1
path_2/proj_2
...
path_n/proj_n
The tool then process information from the path_i/proj_i_functions.csv
files.
This tool also provides some more options, such as presenting the results in HTML format (see the Tis-aggregate coverage section of the Tis-aggregate Manual).
An interactive HTML report can also be generated with tis-report
Beside the statement coverage, MC/DC may also be interesting to evaluate. To know what it is, and how it compares to other criteria, refer to MC/DC (Modified Condition/Decision Coverage).
The evaluation of the MC/DC coverage is performed when a specific set of options is set for the analyses:
$ tis-analyzer --interpreter -whole-program-graph -mcdc \
-info-csv-all <name> <..source files and other options..>
Among other results, it generates a <name>_decisions.csv file that hold information about the decisions (see the About the Decisions section in the Tis-info Manual).
Then, tis-aggregate
has to be used as explained
in the Modified condition/decision coverage section of the Tis-aggregate Manual.
The location and status of each property can be found in the properties.csv
file. If the names given to the user annotations follow some naming conventions
(see Naming the Annotations), it is quite easy to use grep
to extract
more precise information that file.
For instance, if the names of the annotations that should be proved by the WP
plug-in all have a _wp
suffix, it is easy to check if they are all verified
with the following command:
grep "_wp," properties.csv | grep -v ", valid$"
The generated file statements.csv
provides information about certain kinds
of statements in the analyzed program.
For instance, it contains information about the function calls, in particular whether a specific call is direct or not. Moreover, if an indirect call has been encountered during the value analysis, it provides the list of all the possibly called functions. Extracting this information can be done with:
grep ", call," statements.csv | grep -v DIRECT
Some useful information concerning the condition values can also be found here.
Especially, whether a condition happens to be always true or always false. This
kind of situations is often also observable through the dead code, although not
in all cases, since an if
condition might be always true, but may have no
else
branch (which, obviously, would be dead if it existed).
The information about all the variables is available in the variables.csv
generated file. The exception are the global variables which are not accessed or
modified, since they are removed from the analysis results. This information can
also be used, for instance, to easily find the location of the definition of a
variable or to list all the static or volatile variables.
The list of all the existing alarms is given in Value Analysis Alarms.
Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.
False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.
The list of the generated assertions can easily be extracted from the
properties.csv
file (see Information about the Properties). Then, for instance,
these assertions can be counted in order to track the evolution of their total
number during the working process. (Note, however, that this particular measure
is not necessarily very pertinent, because the relation between problems and
emitted alarms is not really one-to-one. Losing precision at one point of the
analysis can lead to several alarms which have the same origin. Moreover,
solving one problem may cause many unrelated new alarms, as several problems
might have been hiding behind the solved one.)
The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.
Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.
Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.
See Tune the Precision for explanations concerning tuning the precision level
with the -slevel
option. The precision level is related with the number of
states that can be stored for each program point. The smaller this number is,
the more coarse the approximation is, as more computed states are merged
together.
Example:
//@ assert 0 < x < 10;
if (x < 5)
y = 5;
else
y = 10;
L:
Computing a unique state at label L
only gives that x ∈ [1..9]
and y ∈
[5..10]
. But if the slevel
is larger, then two states can be stored at
L
giving exactly that either y == 5
when x ∈ [1..4]
or y == 10
when
x ∈ [5..9]
.
Notice that the assertion assert 0 < x < 10
above reduces the possible
values for x
. It is important to remember that this works in the same way
for the assertions automatically generated from alarms. For instance, if a
statement a = b / c;
is reached with c ∈ [0..100]
, an alarm is emitted
in form of an appropriate assertion:
/*@ assert Value: division_by_zero: c ≢ 0; */
Then, the analysis continues with context where c ∈ [1..100]
.
Beside conditions in the code that automatically split the states as above
(when there is enough slevel
),
some builtins are available to generate more than one state.
The builtin tis_variable_split (a, sz, n);
splits the state
on the data which address is a
and the size sz
if it holds less than n
values.
For instance, if f
returns x ∈ [1..5]
in the following code,
the call to tis_variable_split
generates five states,
one for each value of x
:
int x = f();
tis_variable_split (&x, sizeof(x), 10);
Moreover, the builtin tis_interval_split(l, u)
does the same thing that
tis_interval(l, u)
does, but it automatically causes the
individual values between l
and u
inclusive to be propagated
separately. The slevel
option must then be set high enough to keep
the precision of the analysis.
In the following example, since all values of n
are propagated
separately, the analyzer is able to guarantee that the x_pos
assertion
holds.
#include <tis_builtin.h>
int G;
void foo(int n)
{
int x = G;
/*@ assert x_pos: 1 <= x <= 10; */
}
int main(void)
{
int n = tis_interval_split(-10, 10);
G = n;
if (n > 0)
foo(n + 5);
return 0;
}
The analysis runs with a precision setting called the slevel
limit which
indicates the maximum number of individual states the analyzer is allowed to
keep separated at each analyzed statement. When this limit is reached at one
particular statement, the analyzer merges states together. Capping precision in
this way and merging states prevents a combinatorial explosion, while the merge
itself is designed not to miss any undefined behaviors that follow.
However, in particular cases, the loss of precision caused by merging states may
result in the appearance of false alarms. In such cases, it is instrumental to
tune the analyzer to improve precision at critical points of the program. The
basic technique for doing so is to allow the analyzer to keep states separate by
increasing the slevel
limit that applies to those statements. The slevel
limit can be increased for the entire program, but for the purposes of tuning,
it is most beneficial to tune the limit locally, typically with a per-function
granularity.
Apart from manipulating the slevel
limit, there are advanced techniques that
provide control over how the analyzer handles states. Doing so can limit the
number of produced separate states by injecting strategically-placed merges or
ensuring that specific interesting states are kept separate, to ensure an
improvement in precision elsewhere. Some of these techniques are also described
further in this section as well.
Crucially, maintaining precision of every variable at every statement should not be a goal in itself. Analyzing the behavior of the target code in one pass for the millions of variable values that can be grouped together is how the analyzer manages to provide guarantees “for all possible input vectors” while using reasonable time and space for the analysis. Therefore, using the tuning techniques described in this section should only be applied when imprecision leads to false alarms.
The analyzer GUI refers to the number of states generated by the analysis at a
given statement as the slevel
counter, which is compared against the
slevel
limit that applies to that statement. Note that different statements
may be set to have different slevel
limits (see
Tuning the slevel limit).
The slevel
limit for a given statement is displayed in the current function
information widget. In the example below, the slevel
limit of the currently
selected statement is set to 200.
Additionally, the slevel
counters and limits can be listed (and sorted) for
all statements in the currently viewed function by opening the in the statements
tab (see Statements).
However, more conveniently, in the GUI, statements whose slevel
counter
exceeded their slevel
limit are indicated by a red bar in the margin of the
interactive code view (see Interactive Code Panel). Similarly, statements
whose slevel
limit has not been exceeded, but whose slevel
counter
reached above 50% of their allotted limit are marked with a yellow margin bar.
The example below shows a snippet of normalized C code with margins marked in
red and yellow. The yellow margin bar shows that the analyzer propagated enough
states through statement j = ...;
generated to reach at least 50% of its
slevel
limit without exceeding it. The red margin bar shows that analyzer
exceeded the allowed slevel
limit at the statement k = ...
. The
statement i = ...
is not marked in the margin, so its slevel
counter was
than 50% of its slevel
.
Hovering over the margin shows the number of states at this statement—its
slevel
counter—and the statement’s slevel
limit. Here, the statement
j = ...
used 3 out of its slevel
limit of 5.
The analysis can also get imprecise for other reasons than reaching the defined
precision level. This is especially the case when the log trace includes messages
about garbled mix
values. It is very likely that if such messages appear,
the analysis will not produce any interesting results.
Tip
The analysis can be stopped automatically from the command line
option -val-stop-at-nth-garbled
.
This option can also be set from the GUI.
The analysis can also be stopped when it reaches the maximum memory consumption
set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM
. If
TIS_KERNEL_MAX_MEM
is set, the analyzer becomes more conservative
in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2
GiB of
memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM
GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value
for this variable is 62
.
slevel
limit¶There are two basic controls for tuning the slevel
limits of the analyzer,
both of which can be set via commandline options:
-slevel n
: use n
as the default global slevel
limit;-slevel-function f:n
: use n
as the slevel
limit for function f
(this overrides the global limit);In the following example, the analyzer will execute with a global slevel
limit of 100, which will apply to all statements in all functions except main,
whose slevel
limit will be set to 10
.
$ tis-analyzer -val -slevel 100 -slevel-function main:10 example.c
Equivalent settings are also configurable via JSON configuration files (see Loading an analysis configuration).
Function-specific slevel
values can be set via the GUI in the current function
information widget:
Note
Since the default global slevel
limit is 0, some tuning will be necessary
for almost all analyzed code bases.
There are several other options that can be used to fine tune the value analysis.
Some of them control whether states can be kept separated at function returns or loops:
-val-split-return-function f:n
: split the return states of function f
according to \result == n
and \result != n
;
-val-split-return auto
: automatically split the states at the end of each
function according to the function return code;
-val-split-return-function f:full
: keeps all the computed states at the
end of function f
in the callers;
-val-slevel-merge-after-loop <f | @all>
when set, the different
execution paths that originate from the body of a loop are merged
before entering the next execution.
The default behavior is set to all functions
(-val-slevel-merge-after-loop=@all
). It can be removed for some
functions (-val-slevel-merge-after-loop=@all,-f
), and
deactivated for all functions
(-val-slevel-merge-after-loop=-@all
), and activated only for some
(-val-slevel-merge-after-loop=-@all,+f
).
-wlevel n
: do n
loop iterations before widening.
The analyzer can merge states on demand before analyzing specific statements based on the values of specific variables.
To induce the analyzer to merge states at some expression, add a comment to the source code that specifies how the merge should be performed:
//@ slevel merge
merges all the states before analyzing the first
statement following the comment (note there is no underscore between slevel
and
merge
);//@ slevel_merge x
selectively merges all such states where variable x
has the same value before analyzing the first statement following the comment
(note there is an underscore between slevel
and merge
);//@ slevel_merge x, y, …
selectively merges all the states that have the
same value for x, and that have the same value for y, and so on.For example, consider the following program:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
tis_show_each("i j k", i, j, k);
}
When the program is analyzed (with tis-analyzer -val -slevel 100
) each of
the assignments to variables i
, j
, and k
creates two separate states.
In effect the analyzer constructs eight separate states at the statement
tis_show_each:
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {1})
On the other hand, it is possible to limit the number of states by merging
them according to the value of variable i
by adding a comment to the
tis_show_each statement:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel_merge i;
tis_show_each("i j k", i, j, k);
}
In this case, the analyzer only produces two states at that statement, one where the value of i is 0 and another where it is 1. The values of each of the two remaining variables are merged into sets containing both the values of 0 and 1.
[value] Called tis_show_each({{ "i j k" }}, {0}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0; 1}, {0; 1})
It is also possible to limit the number of states by merging them according to
the values of multiple variables. To do this, add a comment to the
tis_show_each statement that merges the states based on the values of i
and j
taken together:
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel_merge i, j;
tis_show_each("i j k", i, j, k);
}
This then produces four states, one for each of the four permutations of i
and j
,
with the values of k
being merged for each of those states:
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0; 1})
Finally, all the states can be merged into one as follows (note that there is no
underscore between slevel
and merge
):
#include <tis_builtin.h>
int main() {
int i = tis_interval_split(0, 1);
int j = tis_interval_split(0, 1);
int k = tis_interval_split(0, 1);
//@ slevel merge;
tis_show_each("i j k", i, j, k);
}
This reduces all the states at tis_show_each
into a single state:
[value] Called tis_show_each({{ "i j k" }}, {0; 1}, {0; 1}, {0; 1})
This technique can be particularly useful when dealing with complex loops. Consider the following example:
#include <tis_builtin.h>
int main() {
int num = 0;
int denom = tis_interval(0, 9);
int step = tis_nondet(-1 , 1);
int i = 0;
while (i < 10) {
int direction = tis_interval(0, 1);
denom += step;
num = direction ? num + i : num - i;
i++;
}
tis_show_each("denom step", denom, step);
int result = num / denom;
return 0;
}
This code snippet calculates the value of integer division between the variables
num
and denom
, which are calculated over 10 iterations of a loop. When
denom
enters the loop, it has some integer value between 0 and 9 and is
increased in each iteration by the value of the variable step
. The value of
step
is indeterminate to the analyzer, but is either -1 or 1 and is
constant throughout the execution of the entire loop (as if it were a parameter
passed in from outside the function). In the case of num
, it starts as 0
and is either increased or decreased by the value of the iterator i
,
depending on direction
. This direction
is represented by a
tis_interval
from 0
to 1
, signifying that the direction is
determined in such a way that the analyzer cannot definitely predict it. So, in
each step the range of values that num
can take grows in both directions.
The direction is decided separately for each iteration, as if it were a result
of a function call executed over and over inside the loop.
The user should be able to quickly determine that denom
can never be 0
,
and so, computing result
should not trigger a division by zero.
Specifically, denom
is expected to be either a value between -10
and
-1
if step
is negative or a value between 10
and 19
if step
is positive.
However, the analyzer will nevertheless report potential undefined behavior when
run. (The command includes -val-slevel-merge-after-loop="-main"
to prevent it
from merging all the states at each iteration.)
$ tis-analyzer -val -slevel 200 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..19], {-1; 1})
tests/tis-user-guide/slevel_merge_loop_1.c:25:[kernel] warning: division by zero: assert denom ≢ 0;
Analyzing the output of the analyzer reveals that the reason behind the detected
undefined behavior is that the abstracted value of denom
spans from -10
to 19
. This is the case because the indeterminacy inside the loop causes the
analyzer to maintain a lot of distinct states, leading it to run out its
slevel
limit and to start merging states. The analyzer specifically merges
states where step
is both -1
and 1
. Since denom
can take more
values in the merged state than the analyzer can represent by enumeration, it
approximated the value of denom
as the span [-10..19]
.
This false alarm can be removed by increasing the precision of the analyzer at
that point. One way to do that is to increase the slevel
limit:
$ tis-analyzer -val -slevel 520 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
This works, but since the number of propagated states is very large, the
slevel
limit must be set to at least the very large number of 520
. Using
slevel_merge
can help keep the slevel
limit significantly lower. The
following modified snippet inserts an slevel_merge
statement just before the
loop, directing the analyzer to merge states at the beginning of each loop
iteration (because it is inserted before the condition) so that step
is kept
a singular value in each resulting state.
#include <tis_builtin.h>
int main() {
int num = 0;
int denom = tis_interval(0, 9);
int step = tis_nondet(-1 , 1);
int i = 0;
//@ slevel_merge step;
while (i < 10) {
int direction = tis_interval(0, 1);
denom += step;
num = direction ? num + i : num - i;
i++;
}
tis_show_each("denom step", denom, step);
int result = num / denom;
return 0;
}
This additional guidance prevents the analyzer from merging the negative and
positive possible value sets for denom
while allowing it to merge states to
account for the indeterminacy inside the loop. So, the analysis van be performed
with a much lower slevel
limit:
$ tis-analyzer -val -slevel 40 slevel_merge_loop_2.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..-1], {-1})
[value] Called tis_show_each({{ "denom step" }}, [10..19], {1})
Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:
-val-ilevel n
: Sets the precision level for integer representation to
n
: each integer value is represented by a set of enumerated values up to
n
elements and above this number intervals (with congruence information)
are used.-plevel n
: Sets the precision level for array accesses to n
: array
accesses are precise as long as the interval for the index contains less than
n
values. See Tuning the precision for array accesses for more information about this
option.There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).
Of course not all existing options have been enumerated here.
The full list of the available options is given by -value-help
.
As explained in Tune the Precision, temporarily reducing the size of the
arrays may be a first step during the interactive phase
to make the analysis time shorter.
But when analyzing large arrays, the -plevel
option can be used
to increase the precision level for array accesses.
This option sets how hard the analyzer tries to be precise for memory accesses
that, considering the imprecision on the indexes involved, can be at any of many
offsets of a memory block. The default is 200 and may not be sufficient for an
access at unknown indices inside a large or nested array to produce a precise
result. This is illustrated by the example below:
#include <tis_builtin.h>
char c[20];
struct s { int m[50]; void *p; };
struct s t[60];
void init(void) {
for (int i = 0; i < 60; i++)
{
for (int j = 0; j < 50; j++)
t[i].m[j] = (i + j) % 10;
t[i].p = c + i % 20;
}
}
int main(void) {
init();
int x = tis_interval(5, 45);
int y = tis_interval(2, 56);
t[y].m[x] = -1;
x = tis_interval(5, 45);
y = tis_interval(2, 56);
int result = t[y].m[x];
int *p = &t[y].m[x];
int result2 = *p;
}
With the default value of -plevel
(200):
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
x ∈ [5..45]
y ∈ [2..56]
result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
With higher plevel:
$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
x ∈ [5..45]
y ∈ [2..56]
result ∈ [-1..9]
p ∈ {{ &t + [428..11604],0%4 }}
result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
Note that result2
is not precise even with the higher plevel.
Handling the lvalue t[y].m[x]
directly allows the analyzer to be optimal
as long as the value of the plevel option allows it to,
but forcing the analyzer to represent the address
as the value of the variable p produces this set of offsets:
{{ &t + [428..11604],0%4 }}
This set of offsets contains pretty much the addresses of everything in t
,
including the p
pointer members,
so it appears when dereferencing p
that the result can be an address.
As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.
The environment variable TIS_KERNEL_MEMORY_FOOTPRINT
can be used to set
the size of caches used during the value analysis,
speeding up some slow analyses in which the caches were getting thrashed.
The default is 2 and each incrementation doubles the size of caches.
Only use this variable if you are not already low on memory.
Another useful option which helps in reducing the computation time is
-memexec-all
. If this option is set, when analyzing a function, the tool
tries to reuse results from the analysis of previous calls when possible.
It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.
First of all, the standard printf
function can be used to output constant
messages or messages involving only precise values. However, printing the
computed values when they are imprecise is not possible using printf
, and
instead the tis_show_each
function should be used. The name of this
function can be extended with any string, so that is is easier to make the
difference between different calls (as the full function name is printed each
time it is called). For instance:
tis_show_each_iteration (i, t[i]);
The statement above will output messages like this (one for each analyzed call):
[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])
Another useful function, which properly handles imprecise values, is
tis_print_subexps
. When given any number of expressions as arguments, it
will print the values of all the sub-expressions of each provided expression.
The first argument of this function must be always a string literal, which will
be printed out in order to help distinguish between different calls. For
instance:
tis_print_subexps ("simple sums", x + y, y + z, z + x);
Such a statement will output messages like this:
[value] Values of all the sub-expressions of simple sums (expr 1):
int x + y ∈ {3}
int x ∈ {1}
int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
int y + z ∈ {5}
int y ∈ {2}
int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
int z + x ∈ {4}
int x ∈ {1}
int z ∈ {3}
Moreover, tis_print_abstract_each
allows the contents of structured variables
to be observed. For instance:
tis_print_abstract_each(&i, &t);
This statement will output messages like this (one for each analyzed call):
[value] Called tis_print_abstract_each:
i ∈ {0; 1; 2; 3; 4}
t[0..4] ∈ [0..10]
[5..99] ∈ UNINITIALIZED
Note that, tis_print_abstract_each
in general takes addresses of variables
as parameters. This applies as well for an array, such as t
in the example
above. Contrary to popular belief, when t
is an array, &t
is not the
same as t
, and the user should call tis_print_abstract_each(&t);
to
see the whole array (pointer decay only shows the first element).
To get even more information, the tis_dump_each
function can be used
to print the whole state at the program point where it is called.
But it may be easier to call the tis_dump_each_file
function
to print the state in a file.
The name of the file is computed from the first argument of the call
(which must be a string literal), an incremented number,
and an optional directory given by the -val-dump-directory
option.
The -val-dump-destination
option allows to choose
which kind of output is expected among txt
or json
(all
for both, none
for no output).
For instance, when calling tis_dump_each_file ("state", *(p+i))
in a test.c
file, and analyzing it with the command:
$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp
these messages are shown in the trace:
test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'
The two generated files hold both the whole state computed the first time the
program point is reached and the possible values for *(p+i)
.
For instance, the JSON file may look like:
{
"file": "test.c",
"line": 11,
"args": "([0..10])",
"state": [
{
"base": "t",
"values": [
{ "offset": "[0..4]", "value": "[0..10]" },
{ "offset": "[5..9]", "value": "UNINITIALIZED" }
]
},
{ "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
]
}
To better understand the results, see the Value Analysis Data Representation section.
In order to avoid wasting time analyzing the application in a wrong context, the
analysis can be stopped as soon as some alarms are generated thanks to the
-val-stop-at-nth-alarm
option. With the argument equal to 1
, it aborts
the analysis at the first alarm. To ignore a certain number of alarms, the
argument can be increased. Although there is no strict relation between the
given argument and the number of alarms generated before the analysis stops
(i.e. these two values are not necessarily equal), one thing is guaranteed:
providing a larger number will lead to skipping more alarms.
The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:
$ kill -USR1 12345
The analyzer can also be stopped through the GUI (see Disconnect/Kill server).
When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.
The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:
$ tis-analyzer --timeout 5m -val ... -save project.state
Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:
tis_watch_cardinal
: stop the analysis when the number of different values
that a given memory location may possibly contain (because of imprecision) is
greater than a certain maximal amount.tis_watch_value
: stop the analysis when a given memory location may
possibly contain a value of the provided set of forbidden values.tis_watch_address
: stop the analysis when a given memory location may
possibly contain an address.tis_watch_garbled
: stop the analysis when a given memory location may
possibly contain a garbled mix value.tis_detect_imprecise_pointer
: stop the analysis when any expression is
evaluated to an imprecise pointer which contains a given base address.These functions are available using #include <tis_builtin.h>
.
The arguments of the four tis_watch_*
functions follow the same logic:
p
and its size s
.n
is the number of statements during which the condition
may remain true before the analysis is stopped:n == -1
, the analysis never stops, but messages are printed
each time the condition is reached;n == 0
, the analysis stops as soon as the condition is reached;n > 0
, the analysis continues for the n-th first occurrences
where the condition is reached (and prints messages for each of them)
and stops at the next occurrence.The function tis_detect_imprecise_pointer
only takes a pointer as argument.
Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:
int x = 0; /* the memory location to watch */
void *p = (void *)&x; /* its address */
size_t s = sizeof(x); /* its size */
int y[10];
/* The analysis stops when x is not exact (i.e. not a singleton value). */
int maximal_cardinal_allowed = 1;
tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);
/* The analysis stops the fourth time when x may be negative. */
int forbidden_values = tis_interval(INT_MIN, -1);
int exceptions = 3;
tis_watch_value(p, s, forbidden_values, exceptions);
/* The analysis stops when x may be an address. */
tis_watch_address(p, s, 0);
/* The analysis stops when x may be a garbled mix value. */
tis_watch_garbled(p, s, 0);
p = y;
/* The analysis starts to detect if an expression is evaluated to an
imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);
/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;
Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.
If you do not know how to add ACSL properties to your project, first read ACSL Properties.
Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.
Warning
Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.
Some examples are given below.
If source code modifications are needed, and the source code is in a git
repository, the tis-modifications
tool may be helpful to track them
in order to check that they are correctly guarded. See
tis-modifications Manual.
As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.
Example:
int T[10];
...
L1: if (0 <= x && x < y) {
...
L2: if (y == 10) {
L3: ... T[x] ...
}
...
}
When the T[x]
expression is encountered at the label L3, the analyzer tries
to check if the T
array is accessed correctly (i.e. inside the array
bounds), namely, is the (0 <= x < 10)
condition true. It already knows that
the (0 <= x)
part holds, due to the conditional statement at label L1
(assuming that x
has not been modified since then). Whatever values x
might have had before, at L1 they have been restrained to only non-negative ones
and this fact has been stored in the internal state (the interval of values for
x
was modified) and thus is visible at L3. For example, if before L1 the
value of x
was [--..--]
(i.e. nothing is known about x
except that
it is initialized), then after L1 it would be [0..--]
(i.e. the interval
spanning from zero to positive infinity).
Now, the analyzer still needs to verify if (x < 10)
also holds. This is
obvious for anybody who reads the code: the condition at L1 assures that (x <
y)
and the condition at L2 assures that (y == 10)
, therefore (x < 10)
must be true. Unfortunately, because of the limitations of the adopted value
representation method, the analyzer cannot deduce this by itself. The fact that
(x < y)
holds just cannot be expressed in the internal state (nor actually
any abstract relation between variables). And, supposing that the value of y
at L1 is [--..--]
, the (x < y)
condition does not help to restrain the
values of x
, its upper bound remains thus as before. Hence, this important
piece of information is lost and the analyzer simply cannot connect it with (y ==
10)
at L2 in order to correctly restrain the value of x
to [0..9]
. So
at L3 it will consider that the value of x
is [0..--]
and it will emit
an alarm about a potential out of bound access to the array T
.
To help the analyzer, the appropriate assertion can be added explicitly:
at L3: assert ax: x < 10;
Then the alarm will disappear. Of course the ax
assertion still needs to be
verified by other means. For example, this particular assertion can be easily
proven using WP (see Prove Annotations with WP).
State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:
at L1: assert ay: y <= 10 || 10 < y;
As explained before, the analyzer can store multiple memory states at each
program point (as explained in About the Value Analysis) and the maximal number of
states that can be stored per program point in the internal representation is
related to the precision level (i.e. slevel
). So, provided that the
-slevel
option has set the precision level high enough in order to permit
splitting the state here, the assertion above will lead to a case analysis:
y <= 10
case: As y <= 10
is assumed, together with the x < y
condition at L1, it leads to deducing x < 10
, and this time this can be
represented in the internal state.10 < y
case: If 10 < y
is assumed, then the condition at L2 is
false, and therefore the execution branch where the array T
is accessed
at L3 is not reached at all.Thanks to introducing this assertion, the alarm will thus disappear. Moreover,
the analyzer is able to check on its own that ay
is always true, so there is
nothing more to verify.
It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.
Tip
Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).
In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.
For example, if the program contains a test (0 <= i+j < 10)
and then several
uses of T[i+j]
follow, it may be convenient to add a temporary variable
representing the i+j
sum:
int tmp = i + j;
if (tmp < 10) {
..
//@ assert a_tmp: tmp == i + j;
... T[tmp] ...
}
If neither i
nor j
are modified in the meantime, the assertion that
validates the code substitution should be trivial to verify, and the value
analysis is now able to know that (tmp < 10)
.
Beside assert
, requires
and ensures
, the loop invariant
properties are also useful to enhance the analysis precision.
For instance, this function generates an alarm:
int T[100];
//@ requires 0 <= n < 50;
void main(int n)
{
int i = 0;
while (i <= n) {
T[i] = 3;
i++;
}
T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;
This is because the value of i
is too imprecise when leaving the
loop, so the analysis doesn’t know if the access to T[i]
in the last
assignment is valid or not.
Adding this loop invariant remove the alarm:
/*@ loop invariant i <= 50; */
Moreover, the value analysis is able to check that this property is always valid.
A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.
Buffer problem: there is a relation between cur and len, but it cannot be represented.
A typical function to handle this buffer is:
void process (char * cur, size_t len) {
char * p = cur;
for (size_t i = 0 ; i < len ; i++, p++) {
*p = ...
}
}
The validity of the pointer p
has to be checked to avoid an alarm on
p
access, but also to get precise results later on.
It is especially important when the pointer points to an array that
is part of a larger structure. For instance:
struct data {
char buffer[BUFFER_LEN];
unsigned current_index;
unsigned state;
};
//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
char * current = data->buffer + data->current_index;
size_t length = BUFFER_LEN - data->current_index;
if (n > length) n = length;
process (current, n);
...
}
If the analysis is not able to know that p
does not go beyond the end of the
buffer
field, the value of the other fields current_index
and state
might be modified as well and might be too imprecise for the analysis
to give interesting results later on.
So the process
function needs a precondition
to give the constraint on cur
and len
to ensure the validity of the pointer.
This precondition could simply be:
//@ requires \valid (cur + 0 .. len-1);
Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);
Notice that the ACSL functions \base_addr
, \block_length
and
\offset
only provide the expected information when cur
is a pointer
to an array allocated on its own. If the array is a field in a structure,
\base_addr(cur)
returns the base address of the structure.
Structure with a .buf
field:
ACSL functions are related to the allocated block, not the internal array.
Anyway in some cases, even if the analyzer computes the optimal information,
cur
and len
both have an unknown value from intervals
and the relation between the two variables has been lost.
So the memory access to (*p)
raises an alarm when we cannot check that adding the upper bound of both
intervals is smaller than (buf + BUFFER_LEN)
.
Moreover, if buf
is in a structure as explained above,
buf
and BUFFER_LEN
may be unknown in the function.
A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:
/*@ requires f_r_buf: val: cur <= bound;
requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
char * p = cur;
//@ loop invariant f_l_p: val: p <= bound;
for (size_t i = 0 ; i < len ; i++, p++) {
if (p >= bound) break;
*p = ...
}
}
In the previous example, the call to process
would have to be changed to:
process_bounded (current, n, data->buffer + BUFFER_LEN);
As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.
Two annotated functions with such bounds, tis_memset_bounded
and
tis_memcpy_bounded
, are provided to be used instead of memset
and
memcpy
when this problem occurs with these libc functions.
This chapter explains how to introduce ACSL properties to a project.
ACSL is the specification language employed in TrustInSoft Analyzer. ACSL properties can be used to specify functions (as seen in Write a Specification) and to guide the analysis by adding local annotations, such as assertions or loop invariants, which may help in removing false alarms (as seen in Remove Alarms by Adding Annotations).
There are two ways to insert ACSL properties in a project:
One way to add ACSL annotations to a project is to write them directly in the source code in special comments:
/*@ ... */
,//@ ...
.There are several kinds of properties and they all need to be placed in an appropriate place in the source code:
For more information about the ACSL language, please refer to the ACSL Documentation.
Caution
The ACSLimporter
plug-in is only available in the commercial version
of TrustInSoft Analyzer.
For many reasons it is usually preferable to avoid modifying the source code which is analyzed, as introducing changes to the application’s code can lead to difficulties in comparing it with the original version. For example, adding new properties alters the line numbering in a file, which makes it impossible to report problems with the original source line number.
The ACSLimporter
plug-in makes it possible to write the ACSL properties into
separate files and then import them for the analysis. The syntax of such files
looks like this:
function <function-name>:
contract:
requires <pre-name-1>: <pre-definition-1>;
assigns <assigns-definition-1>;
ensures <post-name-1>: <post-definition-1>;
at L1: assert <assert-name-1a>: <assert-definition-1a>;
at L1: assert <assert-name-1b>: <assert-definition-1b>;
at L2: assert <assert-name-2>: <assert-definition-2>;
at loop 1:
loop invariant <inv-name-1a>: <inv-definition-1a>;
loop invariant <inv-name-1b>: <inv-definition-1b>;
Of course, the <...>
parts should be substituted by specific names and
definitions.
Depending on the organization of the project, it might be better to put all the properties in a single ACSL file or to split them throughout several files. If the properties concerning the same function appear in different files, the specifications are merged.
To load the property files, so that they are taken into account during the
analysis, the -acsl-import <file.acsl>
option has to be specified for each
concerned ACSL file.
Giving a unique name to each annotation permits referring to it easily later on. Moreover, it makes the result files a lot clearer and more readable: when mentioning a particular annotation they will use its name instead of the corresponding file name and line number.
Using standard naming conventions is highly recommended. Some tools require particular naming of assertions to properly check that everything have been verified at the end of the analysis.
The proposed naming conventions are:
add_
).requires
property for function add
could be then named add_r1
).
(However, this not really necessary if the name is always used together with
the corresponding keyword, like for example: requires add_r1
, ensures
add_e2
, etc.)add_e2_val
: if the property is found always valid by Value
;add_e2_wp
: if the property is proved by WP
;add_e2_sc
: if the property could have been removed as redundant by
Scope
(note: it could be necessary to keep this property anyway
because it still might be useful for Value
or WP
computations);add_e2_rv
: if the property has been manually reviewed.These naming conventions might seem to be quite cumbersome to use (especially the verification method suffix). However, as mentioned before, they make the automatic generation/verification possible, so they are highly recommended.
Caution
The WP plug-in is only available in the commercial version of TrustInSoft Analyzer.
WP refers both to a method to formally verify properties of the analyzed code and the name of the analyzer’s plug-in that implements this method. WP is a static analysis technique, like the value analysis, but involving theorem proving. For a short introduction describing how it works see Short Introduction to WP Computation.
The purpose of this chapter is mainly to explain in which cases WP can be used with a minimal amount of manual work required. This does not mean that it cannot be used in more complex situations, but then it requires more knowledge about the WP computation and/or competences in performing manual proofs using proof assistants such as Coq.
The easiest way to run WP is to do it in the GUI by selecting the annotation to prove, right-clicking to open the pop-up menu, and choosing the Prove Property by WP option.
However, when the analysis evolves, it is usually more practical to run WP from the command line, save the project, and extract the status of the properties from the information file to just check that the results are still the same.
The command line to run WP and save the project looks like:
tis-analyzer -load project.state -wp-fct f1 -wp-prop f1_p1_wp, f1_p2_wp \
-then -wp-fct g,h -wp-prop f_pre_wp \
...
-then -save project.wp.state
This command line:
opens a previously saved project project.state
: it doesn’t need
to include value analysis results, and even doesn’t have to include
an entry point. All it needs is the body of the functions where the
properties to verify are, and probably some specifications for the
functions called from these functions.
tries to verify the properties named f1_p1_wp
and f1_p2_wp
in the f1
function,
then tries to verify the property f_pre
in g
and h
.
Notice that f_pre
is supposed to be a precondition of f
, and that it
is checked in g
and h
which are supposed to be some of f
callers;
saves the results in the project.wp.state
project.
Notice that a _wp
suffix is used in the names of the properties that
are checked with WP. See Naming the Annotations to understand why naming
conventions are useful.
This is an example of how to use WP, but the plug-in provides many other
options if needed. Please use the -wp-help
option to list them, and
refer to the documentation for more details.
To know how to extract the status of the properties from the
project.wp.state
project, see Information about the Properties.
Let us give very simple explanations about WP for the one that knows nothing about it, because it might be necessary to understand how it works in order to use it when suitable.
To verify that a property is true at a program point, the WP principle
is to propagate it backward and to compute a formula that is such that,
if it can be proved to be true, then it ensures that the initial
property is true as well. The computed formula is then sent to some
automatic prover(s). For instance, tis-analyzer
comes with
alt-ergo
, but more provers can be added.
An example is easier to understand:
const int X = 10;
void f1(int x)
{
int y = x + 1;
int z = 2 * y;
L: //@ assert y_val: y > X;
...
}
To ensure that y_val
is true at L, WP computes that one have to
prove that (x+1 > X)
when entering the function. Notice that the
z
assignment has no effect since WP knows that it doesn’t modify
y
value. This can be automatically proved if a precondition gives:
//@ requires r_x_ge_X: x >= X;
This is because the final computed formula is:
x >= X ==> x+1 > X;
which is easily proved by any automatic prover.
It doesn’t work with the precondition:
//@ requires r_x_ge_15: x >= 15;
This is because WP only works on the function source code, which means that it has no information about the value of X. To solve this kind of problem, one can add:
//@ requires r_X_val: X == 10;
This precondition is easily validated by the value analysis and
can be used by WP to finished the proof with r_x_ge_15
.
In this simple case, the initial property and the computed formula are equivalent, but it is not always the case. WP just ensures that if the computed formula is true, then the property is true each time its program point is reached.
To prove a loop invariant
property, the WP computation is very
similar, but decomposed into two goals:
Example:
//@ requires n < 100;
int main(int n)
{
int i; int * p;
for (i = 0, p = T; i < n; i++, p++) {
*p = 3;
}
...
}
The following property remove the alarm about the validity of the
(*p)
assignment in the loop:
//@ loop invariant li_1: p == T+i;
Moreover it can be proved by WP:
the establishment has to be proved before entering the loop, but after the initialization part. So the proof obligation is:
T == T + 0
the preservation formula is similar to:
p == T + i ==> p + 1 == T + (i + 1)
Both formula are trivially true.
In the first example in Short Introduction to WP Computation, the z
assignment has
no effect on the WP computation since WP knows that it doesn’t modify
y
value. But it is different when pointers are involved. For
instance:
void f(int * px, int x, int * py, int y)
{
*px = x;
*py = y;
//@ assert a_px: *px == x;
//@ assert a_py: *py == y;
...
}
WP is able to prove a_py
, but not to prove a_px
. The reason is
that it doesn’t know whether the assignment to (*py)
modifies
(*px)
value or not. The a_px
can be proved only with the
precondition:
//@ requires \separated (px, py);
It tells that there is no intersection between (*px)
and
(*py)
locations in the memory.
In the context of adding annotations to remove alarms, except in very simple cases, it is not recommended to use WP when possibly overlapping pointers are involved since it may take some time to provide enough information.
The other problem is when there are some function calls between the property and the statements that makes it true. Remember that WP only work on the source code of the property function, and on the specifications of the called functions.
extern int X;
void g (void);
void f(int x, int y)
{
if (x > y && x > X) {
g ();
//@ assert ax1: x > y;
//@ assert ax2: x > X;
...
}
...
}
WP is able to prove ax1
since there is no way for g
to modify
either x
or y
, but ax2
cannot be proved since g
may
modify X
.
There are two solutions to solve the problem:
add an assigns
property for g
to specify the modified data.
For instance, ax2
is proved when adding:
//@ assigns \nothing;
This is not the preferred method since assigns
are difficult to prove: it
requires to know the modified data for each statement of g
. The computed
dependencies may help to justify the assigns
property, but beware that
this information is context dependent.
add a postcondition about the involved data. For instance:
specifying that X
is not modified by g
:
//@ ensures X == \old (X);
or specifying that X
decrease:
//@ ensures X < \old (X);
Both solutions enable to prove ax2
.
The WP could seem useless if not used in complex cases, but it is not true: even when properties look trivial, it is useful to formally prove them, since it is so easy to make a mistake.
Let us look at an example:
//@ ensures e_res_ok: min <= \result <= max;
int bounds(int min, int x, int max)
{
int res = x;
if (x < min) res = min;
if (x > max) res = max;
return res;
}
The postcondition seems reasonably easy to justify, but WP is unable to prove it. WP computes a proof obligation equivalent to:
if (x > max) then min <= max /\ max <= max
else if (x < min) then min <= min /\ min <= max
else min <= x /\ x <= max
After simplifying the formula, it appears that the information
(min <= max)
is missing, so this postcondition cannot be proved
without a precondition. It then has to be added and checked in every
context where the function is called to ensure that the post-condition
is verified.
The advice here is to use WP only in simple cases because complex cases needs expertise and require a lot of time. But we have seen that even for properties that look trivial, it is better to formally prove them, since it is so easy to make a mistake. Moreover, manual justification of trivial properties may look a little silly.
One must be especially careful when it seems that WP should be able to prove something, and doesn’t, since it may hide a problem somewhere. It is always better to understand if it is really a WP weakness, or something else.
Now that you should know how to analyze an application, it is important to insist on how it is important to put things together and check all the hypotheses.
If there is only one analysis, it is quite easy to check. The results rely on:
assigns
and ensures
properties of the external functions because
they cannot be checked,valid
according the
value analysis or proved by WP.
If there are several analyses, the results of each of one rely on the
same hypotheses than above, but there are more things to check:To be fully verified in the given context all the hypotheses above must have a clear justification for lack of formal verification.
TrustInSoft Analyzer++ lets you analyze C++ programs. This document describes:
In addition, there is also a separate getting started tutorial on analyzing C++ code in the Analyzing C++ code section of the manual.
The identifiers in a C++ program are mangled to match C identifiers. The mangling scheme used in TrustInSoft Analyzer is a variation of Itanium mangling. The differences are:
Class, union and enum names are also mangled, even if this is not
required by Itanium. The grammar entry used for these types is
_Z<name>
. As such, the class:
struct Foo {
int x;
}
is translated as:
struct _Z3Foo { int x; }
Local variables and formal parameter names are also mangled, to avoid
shadowing extern "C"
declarations. The grammar entry used for a
local variable is _ZL<unqualified-name>
. As such, the local
variable bar
in:
int main() {
int bar x = 2;
}
is mangled as _ZL3bar
. The keyword this
is not mangled.
The virtual method table and the typeinfo structure for a class
Foo
are mangled as extra static fields named __tis_class_vmt
and __tis_typeinfo
in this class. As such, the class:
struct Foo {
virtual void f() {}
};
leads to the generation of two variables with mangled names
_ZN3Foo15__tis_class_vmtE
and _ZN3Foo14__tis_typeinfoE
.
To make reading the identifiers easier, TrustInSoft Analyzer displays
by default a demangled version of the identifier. In the GUI, the
mangled name can be obtained by right-clicking on an identifier and
select Copy mangled name
.
Signatures are ignored when demangling function names. As such, the assignment in:
void func(int) {}
void
test()
{
void (*ptr)(int) = &func;
}
is displayed as:
void (*ptr)(int);
ptr = & func;
even if the mangled name of func
is _Z4funci
. This can lead to
ambiguity when there are multiple overloads for the named function. A
solution to solve it is to look at its mangled name.
Constructors and destructors are demangled as Ctor
and Dtor
.
If the constructor or destructor is a constructor for a base class and
is different from the constructor for the most derived object, the
suffix Base
is added. If the constructor is a copy constructor,
the suffix C
is added. If the constructor is a move constructor,
the suffix M
is added. Therefore, the demangled name
Foo::CtorC
stands for the copy constructor of the class Foo
.
If the destructor is virtual, it will be demangled as
DeletingDtor
.
The option -cxx-filt
can be used to print the demangled version of
an identifier, as demangled by the analyzer. If the identifier is a
function name its signature will also be printed. For example, the
command tis-analyzer++ -cxx-filt _Z3fooii
displays {foo(int,
int)}
.
When displayed, function return types are preceded by a ->
symbol and are displayed after the formal parameter types. For
example, the instance of the function show
in the following code:
struct Foo {
void f(int) {}
};
template <typename T>
void show(const T&) {}
int
main()
{
show(&Foo::f);
}
is printed as show<{(int) -> void} Foo::*>
.
Template parameter packs are printed enclosed by [
and ]
. As
such, the command tis-analyzer++ -cxx-filt _Z1fIJ3Foo3FooEEvDpRKT_
displays {f<[Foo, Foo]>(const [Foo, Foo]&) -> void}
: f
is a
function templated by a parameter pack, which is instantiated with
Foo, Foo
. Note also that in this case the const
and &
are
applied to the whole pack.
Names displayed in the GUI can be prefixed by ...
. These names are
shortened versions of qualified names. Clicking on this prefix will
display the full mangled or demangled name, depending on the command
line options.
When calling a function, TrustInSoft Analyzer uses different transformations to initialize the function’s arguments depending on the type of the argument. These transformations match Itanium calling convention.
Scalar types are kept as is.
Reference types are translated as pointers to the referenced types. The initialization of an argument of reference type is translated as taking the address of the initializer. If this initialization requires the materialization of a temporary object, this step is done by the caller. For example, with the following original source code:
void f(int &&, int &);
void g() {
int x;
f(2, x);
}
the translated declaration for the function f
is void f(int *a,
int *b)
and the call to f
is translated as:
int x;
int __tis_temporary_0;
__tis_temporary_0 = 2;
f(& __tis_temporary_0,& x);
The passing of a class type depends on whether the class is non-trivial for the purposes of calls. A class type is non-trivial for the purpose of call if:
If the type is non-trivial for the purposes of calls, a variable of
the class type is defined in the caller and the function receives a
pointer to this variable. Such variables are named __tis_arg_##
.
For example, in the following code:
struct Obj {
Obj();
Obj(const Obj &);
};
void f(Obj x, Obj y);
void g() {
f( {}, {} );
}
the translated function f
has the signature:
void f(struct Obj *x, struct Obj *y);
and its call is translated as:
struct Obj __tis_arg;
struct Obj __tis_arg_0;
{
Obj::Ctor(& __tis_arg_0);
Obj::Ctor(& __tis_arg);
}
f(& __tis_arg,& __tis_arg_0);
If the function returns a class that is non-trivial for the purposes
of calls, then it is translated as a function returning void
but
with an additional argument. This argument is a pointer to a variable
in the caller that will receive the function return. If the caller
does not use the function return to initialize a variable, a variable
named __tis_cxx_returnarg_##
is created for this purpose.
For example, with the following original source code:
struct Obj {
Obj();
Obj(const Obj &);
};
Obj f();
void g() {
Obj o = f();
f();
}
the translated function f
has the signature:
void f(struct Obj *__tis_cxx_return)
and the body of the function g
is translated as:
struct Obj o;
f(& o);
{
struct Obj __tis_cxx_returnarg;
f(& __tis_cxx_returnarg);
}
return;
If the type is trivial for the purposes of calls, no transformation is applied and the object is passed by copying its value. For example, with the following original source code:
struct Obj {
Obj();
};
Obj f(Obj o);
void g() {
f( {} );
}
the signature of the translated function f
is
struct Obj f(struct Obj o)
Sometimes, TrustInSoft Analyzer cannot decide if a class is trivial for the purposes of calls in a translation unit. In such cases, it will assume that the type is non-trivial for the purposes of calls and emit a warning like:
[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.
If the user knows that the type is trivial for the purpose of calls, he can use the option -cxx-pass-by-value to force this.
For example, with the following original source code:
struct Foo;
void f(Foo x);
with no particular option set, TrustInSoft Analyzer will produce the following warning and declaration for f:
[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.
void f(struct Foo *x);
with the option -cxx-pass-by-value _Z3Foo, TrustInSoft Analyzer will produce the following declaration for f without warning:
void f(struct Foo x);
Using an incorrect passing style can lead to errors like:
[kernel] user error: Incompatible declaration for f:
different type constructors: struct _Z3Foo * vs. struct Foo
First declaration was at file1.cpp:7
Current declaration is at file2.c:7
or
[kernel] user error: Incompatible declaration for f:
different type constructors: struct Foo vs. void
First declaration was at file.c:7
Current declaration is at file.cpp:7
Methods do not exist in C, and are translated as functions by TrustInSoft Analyzer++. The following additional transformations are applied to non-static methods:
this
argument. Its type is a pointer to the
class enclosing the method. There is an additional const qualifier
if the method is const-qualified.this
argument is
initialized with the address of the calling object.For example, with the following original source code:
struct Obj {
Obj();
static void bar(int x);
void foo(int x) const;
};
void
f(void)
{
Obj o;
o.foo(1);
Obj::bar(0);
}
two function declarations are produced:
void Obj::bar(int x);
void Obj::foo(const struct Obj *this, int x);
and the calls to foo
and bar
are translated as:
Obj::foo(& o,1);
Obj::bar(0);
By default, constructor elision is enabled and TrustInSoft Analyzer++ will omit some calls to copy or move constructors to temporary objects, as allowed by C++ standards from C++11 onwards.
Constructor elision can be disabled with the
-no-cxx-elide-constructors
option.
For example, with the following original source code:
struct Obj {
Obj();
};
Obj f();
void g() {
Obj y = f();
}
when constructor elision is enabled, the call to f
is translated as:
f(& y);
However, when constructor elision is disabled with the option
-no-cxx-elide-constructors
, it is translated as:
struct Obj __tis_temporary_0;
f(& __tis_temporary_0);
Obj::CtorM(& y,& __tis_temporary_0);
In this case, the result of the call to f
is written to the
temporary object __tis_temporary_0
and this temporary object is then
moved to the initialized variable y
.
Virtual method calls translation is separated in three steps:
__virtual_tmp_XXX
, where XXX
is the unqualified name of
the method.this
pointer calling the method, and call
the resolved function pointer using the previous information.this
to fetch the eventual virtual base (see the
paragraph at the end of this section)nullptr
.As such, the function call_get
in the following code:
struct Foo {
virtual Bar *get() { return nullptr; }
};
Bar *call_get(Foo *f) {
return f->get();
}
is translated as:
struct Bar *call_get(struct Foo *f)
{
struct Bar *__retres;
char *__virtual_return_get;
struct __tis_vmt_entry const *__virtual_tmp_get;
char *tmp_0;
__virtual_tmp_get = f->__tis_pvmt + 1U;
__virtual_return_get = (char *)(*((struct Bar *(*)(struct Foo *))__virtual_tmp_get->method_ptr))
((struct Foo *)((char *)f + __virtual_tmp_get->shift_this));
if (__virtual_return_get) tmp_0 = __virtual_return_get + __virtual_tmp_get->shift_return;
else tmp_0 = __virtual_return_get;
__retres = (struct Bar *)tmp_0;
return __retres;
}
The special case of covariance on virtual bases: if the called virtual function is covariant and if its return type has a virtual base of the return type of the overridden function, we need to fetch this virtual base at the call site.
To do so we need to get the offset to apply to the returned object pointer.
This offset is in an array, and there is a pointer to this array at the offset
0 of the returned object. So we cast the returned object as pointer to an array
of offsets, and access this array at the vbase_index
to get the offset.
In this case the code is translated as such:
if (__virtual_tmp_f->vbase_index != (long)(-1)) // do we have a virtual base?
__virtual_return_f += *(
*((long **)__virtual_return_f) // get the array of offsets
+ __virtual_tmp_f->vbase_index); // get the appropriate offset
The option -no-cxx-inline-virtual-calls
can be used to replace
this transformation by a call to a generated function named
XXX::__tis_virtual_YYY
, where:
XXX
is the static type of the class containing the method that
was called.YYY
is the unqualified name of the method.With this option, the function call_get
of the example above is
translated as:
struct Bar *call_get(struct Foo *f)
{
struct Bar *tmp;
tmp = Foo::__tis_virtual_get(f);
return tmp;
}
The generated __tis_virtual_
functions keep the states obtained by
the virtual call separated.
TrustInSoft Analyzer uses its own memory layout to represent C++
objects. In order to preserve as much useful information as possible,
the analyzer defines multiple well-typed data structures, and uses
more than one extra pointer field in polymorphic classes. As a result
of this choice, the numeric value of sizeof(Class)
will differ
between the compiled code and the analyzed code.
Objects, being class
or struct
, are translated as C structures.
union
are translated as C unions.
The inline declaration of a static field is translated as a declaration of a global variable with the same qualified name. The out-of-line definition of a static field is translated as a definition of a global variable with the same qualified name.
Non-static fields are translated as fields in the translated structure. The fields are emitted in the source code order.
Empty classes are translated as a structure with one field char
__tis_empty;
. This enforces that the size of an empty class is not
zero.
Non-virtual non-empty base classes are translated as fields in the
derived class. Such fields are named __parent__
followed by the name of
the base class.
For example, with the following original source code:
class Foo {
int x;
};
struct Bar: Foo {
int y;
int z;
};
the structures produced for the class Foo
and Bar
will be:
struct Foo {
int x ;
};
struct Bar {
struct Foo __parent__Foo ;
int y ;
int z ;
};
Non-virtual empty base classes do not appear in the translated C structure. For example, with the following original source code:
class Foo { };
struct Bar: Foo {
int y;
int z;
};
the structure produced for the class Bar
is:
struct Bar {
int y ;
int z ;
};
In this case, a reference to the base Foo
of an object of type
Bar
binds to the original object. In other words, the assertion in
the following program is valid in the model used by TrustInSoft
Analyzer:
class Foo {};
struct Bar: Foo {
int y;
int z;
};
int
main()
{
Bar b;
Foo &f = b;
void *addr_b = static_cast<void *>(&b);
void *addr_f = static_cast<void *>(&f);
//@ assert addr_b == addr_f;
}
If a C++ class is polymorphic, its corresponding C structure contains two additional fields:
struct __tis_typeinfo const *__tis_typeinfo;
holding a
pointer to the type_info
of the most derived object of the
current object.struct __tis_vmt_entry const *__tis_pvmt;
holding a pointer
to the virtual method table of the current object.As an example, the class:
struct Foo {
int x;
virtual void f() {}
};
is translated as:
struct Foo {
struct __tis_typeinfo const *__tis_typeinfo ;
struct __tis_vmt_entry const *__tis_pvmt ;
int x ;
};
These additional fields are set by the constructors of the polymorphic class.
If a class has a virtual base, its translation produces two different C structures: the regular C structure as well as a base version of the class.
The regular structure is used when the object is the most derived object. In this case:
long const * __tis_vbases_ptr;
.
This is an array holding the offset of each virtual base of the object.__tis_vbases_
to
distinguish them from non-virtual bases.The base version of the object has its name prefixed by
__vparent__
and is used when the object is used as a base for
another object. In this case:
__tis_vbases_ptr
of
type long const *
. This is an array to the offset of each
virtual base of the object.As an example the following class:
struct Baz: Bar, virtual Foo {
int z;
};
produces the two classes:
struct Baz {
long const *__tis_vbases_ptr ;
struct __tis_base_Bar __parent__Bar ;
int z ;
struct Foo __vparent__Foo ;
};
struct __tis_base_Baz {
long const *__tis_vbases_ptr ;
struct __tis_base_Bar __parent__Bar ;
int z ;
};
Accessing a virtual base is always done by shifting the address of the
current object with the offset of the virtual base in the
__tis_vbases_ptr
array.
As an example, with the following code:
struct Foo {
int x;
};
struct Bar: virtual Foo {
int y;
};
int
main()
{
Bar bar;
Foo &foo = bar;
}
the body of the main
function is translated as:
int __retres;
struct Baz baz;
struct Foo *foo;
Baz::Ctor(& baz);
foo = (struct Foo *)((char *)(& baz) + *(baz.__tis_vbases_ptr + 0));
__retres = 0;
return __retres;
The virtual base Foo
of a class Baz
has index 0, so the offset
to use to go from Baz
to Foo
is *(baz.__tis_vbases_ptr +
0)
The full layout for objects is the following, in increasing address order:
Pointers to a method X Foo::f(A1, A2, ..., An)
are translated as a C
structure with the following fields:
unsigned long vmt_index ;
X (* __attribute__((__tis_sound_cast__)) ptr)(struct Foo *, A1, A2, ..., An) ;
long shift ;
size_t vmt_shift ;
If Foo::f
is a non-virtual method, then:
ptr
is a pointer to the method called when resolving
the symbol f
in the scope of Foo
. This can be the method
Foo::f
if f
is declared in Foo
or a method of one of the
parent classes of Foo
.shift
is the offset of the base containing the method
f
. If f
is in Foo
, then this is 0, otherwise it is the
offset of the parent class declaring f
.vmt_index
is 0.If Foo::f
is a virtual method, then:
ptr
is the same as if Foo::f
was a non-virtual method.shift
is the same as if Foo::f
was a non-virtual method.vmt_index
is 1 + the index of the method in the virtual
method table of the class containing the final override of f
in
Foo
. This can be different from the index of f
in the virtual
method table of Foo
if the final override of f
is declared in a
parent class of Foo
.Each pointer to member function type produce a different structure type.
The structure type is named __tis_XXXX
, where XXXX
is the mangled
name of the method pointer type.
For example, with the classes:
struct Pack {
char c[1000];
};
struct Bar {
int y;
int f() { return 2; }
};
struct Foo: Pack, Bar {
virtual void g() {}
};
the following statements:
int (Foo::*x)(void) = &Foo::f;
void (Foo::*y)(void) = &Foo::g;
are translated as:
struct __tis_M3FooFivE x;
struct __tis_M3FooFvvE y;
x.vmt_index = 0UL;
x.ptr = (int (*)(struct Foo *))(& Bar::f);
x.shift = 0L - (long)((struct Foo *)((unsigned long)0 - (unsigned long)(& ((struct Foo *)0)->__parent__Bar)));
x.vmt_shift = 0UL;
y.vmt_index = 2UL;
y.ptr = (void (*)(struct Foo *))(& Foo::g);
y.shift = 0L;
y.vmt_shift = ((unsigned long)(& ((struct Foo *)0)->__tis_pvmt);
If a variable v
is initialized at dynamic initialization time, it
is translated as:
v
.void __tis_init_v()
. The content of this function is
the translation of the initializer of v
.All __tis_init_XXX
functions are called by a special function
__tis_globinit
. The __tis_globinit
function is in turn called
at the beginning of the main
function.
As an example, the program:
int id(int x) { return x; }
int x = id(12);
int
main()
{
return x;
}
is translated as:
int x;
void __tis_init_x(void)
{
x = id(12);
return;
}
__attribute__((__tis_throw__)) int id(int x);
int id(int x)
{
return x;
}
__attribute__((__tis_throw__)) int main(void);
int main(void)
{
__tis_globinit();
return x;
}
void __tis_globinit(void)
{
__tis_init_x();
return;
}
A variable with a constant initializer is translated as a C variable with an initializer. The initializer is the value of the C++ constant initializer. As an example, with the following code:
constexpr
int
add_one(int x)
{
return x + 1;
}
const int x = add_one(2);
the definition of the variable x
is translated as:
static int const x = 3;
In some special circumstances, one may need to disable static
initialization semantics described by the C++ standard. It can be done
using the option -no-cxx-evaluate-constexpr
. In this case,
whenever a variable is initialized with a constant initializer that is
not a constant initializer according to the C rules, the
initialization of this variable is done at dynamic initialization time
and uses the initializer as it was written by the user.
Using this option can lead to unsound results. As an example, with the following program:
constexpr int id(int x) { return x; }
extern const int x;
const int y = x;
const int x = id(1);
int
main()
{
int a = y;
int b = x;
//@ assert a == b;
return a == b;
}
tis-analyzer++ --interpreter test.cpp
.tis-analyzer++ --interpreter test.cpp -no-cxx-evaluate-constexpr
.A static local variable x
is translated as a triple of:
__tis_guard_x
.__tis_guard_x
ensuring the initialization of the
static variable is not recursive, followed by a conditional block
doing the initialization of the variable once.The variable __tis_guard_x
can have the following values:
x
has not been initialized yet.x
is being initialized.x
has been initialized.As an example, the following function:
int
main()
{
static Foo f;
return 0;
}
is translated as:
int main::__tis_guard_f;
struct Foo main::f = {.x = 0};
int main(void)
{
int __retres;
tis_ub("Recursive initialization of the static local variable f.",
main::__tis_guard_f != 1);
if (! main::__tis_guard_f) {
main::__tis_guard_f ++;
Foo::Ctor(& main::f);
main::__tis_guard_f ++;
}
__retres = 0;
return __retres;
}
TrustInSoft Analyzer++ introduces several special variables while translating code. This section sumarizes the different name families used for these variables.
__tis_ABI::exc_stack_depth
: how many exceptions are currently raised.__tis_ABI::exc_stack
: all exceptions currently raised.__tis_ABI::caught_stack_depth
: how many exceptions are currently caught.__tis_ABI::caught_stack
: all exceptions currently caught.__tis_unwinding
: whether the program is currently unwinding its stackXXX::__tis_class_vmt
: virtual method table for an object of type
XXX
used as most derived object.XXX::__tis_class_typeinfo
: typeinfo for an object with a most
derived object being of type XXX
XXX::__tis_class_inheritance
: inheritance information for an
object with most derived object being of type XXX
__Ctor_guard
: guard used to check if the lifetime of an object has started.__tis_alloc
: materialization of a space reserved by an allocation function.__tis_arg
: materialization of a function argument.__tis_assign
: temporary variable used to hold the right hand side of an assignment
if it has potential side effects.__tis_bind
: temporary variable used to initialize non-reference
structured bindings of arrays.__tis_cast
: result of a dynamic_cast
.__tis_const
: materialization of an function argument that is non-trivial
for the purpose of calls.__tis_compound_literal
: materialization of a C++ temporary used to
initialize a compound literal.__tis_constant_expression
: materialization of a C++ constant expression.__tis_cxx_return_arg
: materialization of the discarded result of a call
that is non-trivial for the purpose of calls.virtual_dtor_tmp
: virtual method table cell used when calling a
virtual destructor.__tis_deleted_value
: address of a deleted value.__tis_dereference
: temporary variable used to dereference a member pointer.__tis_dyncast
: operand of a dynamic_cast
.__tis_exn
: materialization of an object of type std::bad_XXX
being thrown.__tis_gnu_ternary
: shared computation in a GNU ternary expression.__tis_guard
: guard controlling the initialization of a static
local variable.__tis_implicit_value
: materialization of a C++ temporary used to
perform an implicit value initialization.__tis_index
: index used when destroying an array when its lifetime finishes.__tis_initializer_list
: materialization of a C++ temporary used to
build an initializer list.__tis_init_size
: loop variable used to initialize arrays.__tis_lambda_temp
: materialization of a C++ temporary variable in a lambda.__tis_lvalue
: materialization of a C++ lvalue that is not an lvalue in C.__tis_mmp
: method pointer being called.__tis_mmp_init
: temporary variable used to initialize a method pointer.__tis_object_cast
: temporary variable used to compute object inheritance casts.__tis_offset
: index used to destroy array elements as a consequence of calling delete[]
.__tis_placement
: address of an object initialized by a placement new.__tis_relop
: intermediate result when translation relational operators.__tis_temp
: materialization of a C++ temporary variable.__tis_thrown_tmp
: materialization of a C++ temporary variable in a thrown
statement.__tis_typeid
: address of an object with a polymorphic typeid
being computed.__virtual_return
: the result of the call to a virtual method
returning a pointer.__virtual_this
: this
pointer computed when calling a virtual method.__virtual_tmp
: virtual method table cell used when calling a
virtual method. The name of the called function is added to the end of
the name of the temporaryFunction contracts will be automatically generated for non-static class methods, in order to require validity of the this pointer as a precondition. Copy and move constructors will also grow separated annotations since they are expected to operate on separate objects.
These contracts will be added to user-provided contracts, if any.
Computation of these annotations can be disabled with the
-no-cxx-generate-contracts
option.
TrustInSoft Analyzer C++ introduces additional builtins to support the analysis of C++ code bases. These are listed in the relevant section of the builtin reference and explained below.
tis_make_unknown
builtin¶The analyzer provides the function tis_make_unknown for use in C code. The function takes a pointer and size, and sets the contents of the so-described area of memory to be unknown. This can be used to abstract over the contents of variables and objects (see e.g., Prepare the Analysis).
TrustInSoft Analyzer++ overloads the function with its C++-friendly variant. This variant has the same semantics, but a different signature:
void tis_make_unknown(void *, unsigned long);
Here, the overloaded tis_make_unknown
function takes void *
as its first argument, in contrast to the C-variant
which takes char *
. Passing in void
pointers is more convenient in C++,
because C++ can implicitly cast any other pointer type to void *
(whereas char *
would require the cast to be explicit).
tis_make_unknown
builtin¶When using tis_make_unknown
on an object, the analyzer treats the entire
indicated memory area as having unknown contents. This includes both the
user-defined data, as well as metadata, such as a virtual table pointers, which
should typically be preserved. If this information is not preserved, the
object’s virtual methods and base classes become imprecise.
For this reason, TrustInSoft Analyzer++ provides another C++-specific variant of the tis_make_unknown builtin:
template <typename T> tis_make_unknown(T *);
The builtin is defined as a function template that takes a pointer to an object
of any type T
. The builtin abstracts all the user-defined contents of the
provided object, but does not interfere with metadata added required by the
analyzer to model polymorphism and inheritance.
Example. Consider the following program. Here, we define a class named
Obj
whose two members are a field x
and a virtual method f
which
also returns the value of x
. Then, in main
, we instantiate an object of
this class and use tis_make_unknown
to set the value of the object’s field
to be unknown. We then use tis_show_each
to print out the values of x
and the result of the call to method f
:
#include <tis_builtin.h>
struct Obj {
int x;
virtual int f() {
return x;
}
};
int main() {
Obj obj = {};
tis_make_unknown(&obj, sizeof obj);
tis_show_each("obj.x", obj.x);
tis_show_each("obj.f()", obj.f());
return 0;
}
When we analyze this program, the analyzer shows the value of x
, as
expected, but subsequently it also raises an alarm indicating that the program
tried to dereference an invalid pointer. The analyzer emits the alert because we
used the variant of tis_make_unknown
that also sets virtual method table
pointers to unknown.
$ tis-analyzer++ -val poly.cpp
¶[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
tests/tis-user-guide/man/tis-analyzer-plusplus/poly.cpp:15:[kernel] warning: pointer arithmetic: assert \inside_object_or_null((void *)obj.__tis_pvmt);
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])
Instead of using the variant of tis_make_unknown
(with two arguments) that
overwrites the object’s metadata, we should modify the program to use the
template function variant of tis_make_unknown
(with just one argument) to
preserve object metadata while setting the object’s field to unknown:
📎 poly2.cpp
[excerpt]
¶ tis_make_unknown(&obj);
When we analyze this program now, it shows the expected (unknown) values of
x
and result of calling f
, but does not emit an alert, meaning
tis_make_unknown
did not clear the object’s virtual method table pointer.
$ tis-analyzer++ -val poly2.cpp
¶[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])
Example. The following program presents a situation, where class Obj
does not have members of its own, but it inherits field x
from class
Base
via virtual inheritance. Then, the object obj
of class Obj
is
instantiated in main
and its members’ values are set to be unknown via
tis_make_unknown
. Then, we show the value assigned to field x
inherited
by object obj
from class Base
:
#include <tis_builtin.h>
struct Base {
int x;
};
struct Obj: virtual Base {};
int main() {
Obj obj = {};
tis_make_unknown(&obj, sizeof obj);
tis_show_each("obj.x", obj.x);
return 0;
}
As in the example above, when running the analyzer on the program, we find that
an alarm was raised, and again because this variant of tis_make_unknown
sets the virtual method table pointers to unknown, but the
virtual method table is also used in virtual inheritance.
$ tis-analyzer++ -val virt.cpp
¶tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: pointer arithmetic:
assert \inside_object_or_null((void *)obj.__tis_vbases_ptr);
tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: out of bounds read. assert \valid_read(obj.__tis_vbases_ptr+0);
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
Hence, we should modify this program to the variant of tis_make_unknown
that
will preserve the pointer within obj
to the virtual methods table with which
it is associated:
📎 virt2.cpp
[excerpt]
¶ tis_make_unknown(&obj);
This removes the alert and produces the expected (unknown) value fo obj.x
:
$ tis-analyzer++ -val virt2.cpp
¶[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
Tip
In cases where a class does not depend on a virtual methods
table (i.e., it has no virtual inheritance and not virtual methods),
both variants of tis_make_unknown
are equivalent and can be
used interchangeably.
You can run your tests that use GoogleTest framework with TrustInSoft Analyzer++ in a few steps.
GoogleTest User’s Guide : https://google.github.io/googletest/
TrustInSoft Analyzer++ provides 2 options in order to activate GoogleTest support:
-gtest
option if you wrote the entry point of your tests yourself.-gtest-main
option (as opposed to the -gtest
option) if your
tests use the default GoogleTest entry point (from gtest_main library).More details about gtest_main can be found at https://google.github.io/googletest/primer.html#writing-the-main-function
Specify your own sources, headers and preprocessing options as you would do for any other analysis (see Prepare the Sources).
By providing the -gtest
option (or the -gtest-main
option) to
TrustInSoft Analyzer++, the analyzer will pull in all GoogleTest source
files and headers for you. Thus you do not have to list them in your analysis
configuration files.
As an example, let us assume that you are testing a software module called
module1
, and that you have gathered your tests that use GoogleTest
framework in a tests
subdirectory.
|-- module1
| |-- include
| | ...
| |-- src
| | |-- component1.cc
| | ...
| |-- tests
| | |-- component1_unittest.cc
| | ...
|-- my_analysis
| |-- mod1_comp1_unittest.json
| ...
For instance mod1_comp1_unittest.json
would look like
{
"name": "mod1_comp1_unittest",
"prefix_path":"../module1/",
"files": [
"tests/component1_unittest.cc",
"src/component1.cc"
]
}
Note that you do not need to add the gtest \*.cc
files to the "files"
list.
Next run the analysis with both the --interpreter
option (see
Getting Started) and the -gtest
option (or the
-gtest-main
option).
tis-analyzer++ --interpreter -gtest -tis-config-load path/to/<my_unit_test>.json
Note: We provided the option -gtest
directly on the command line in order
to highlight it, but you can also move it to the configuration file
"gtest": true
EXPECT_FATAL_FAILURE
and EXPECT_NONFATAL_FAILURE
)EXPECT_EXIT
)You should make sure by yourself that your tests do not use these features, as they are untested at the moment. Therefore, the analyzer will probably not do what you expect if you use these features, and neither will it specifically warn you that it does not support them.
For more details about the assertion macros provided by GoogleTest visit https://google.github.io/googletest/reference/assertions.html
This section gives some details about how to deal with special features needed to analyze some applications:
Caution
The tis-mkfs
tool is only available in the commercial version
of TrustInSoft Analyzer.
The tis-mkfs
utility helps to build C files that gives information
about the file system in which the application is
supposed to run. For more information, please refer
to the tis-mkfs Manual.
The default initial environment for the analysis of a program is empty. In order to perform an analysis in a specific environment, it has to be populated from the user code using one of the two methods below.
The user may set some variables
by calling setenv
(or putenv
) in the analysis entry point.
Example:
#include <stdlib.h>
extern int main (int argc, char * argv[]);
int tis_main (int argc, char * argv[]) {
int r;
r = setenv ("USER", "me", 1);
if (r != 0) return 1;
r = setenv ("HOME", "/home/me", 1);
if (r != 0) return 1;
r = setenv ("SHELL", "/bin/sh", 1);
if (r != 0) return 1;
return main (argc, argv);
}
Alternatively, the user may initialize the environ
standard variable.
This variable:
"variable=value"
,NULL
value, if any, will not get accessed).Example:
#include <stdlib.h>
extern int main (int argc, char * argv[]);
extern char **environ;
int tis_main (int argc, char * argv[]) {
char *custom_environ[] = {
"USER=me",
"HOME=/home/me",
"SHELL=/bin/sh",
NULL
};
environ = custom_environ;
return main (argc, argv);
}
Using one of the two methods above to initialize the environment, the following program can be analyzed:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char * argv[]) {
char * user = getenv ("USER");
if (user)
printf ("USER value is '%s'\n", user);
return 0;
}
The command line would be:
$ tis-analyzer tis_main.c main.c -val -main tis_main -slevel 10
And the following output can be observed:
...
USER value is 'me'
...
The initial size of the array pointed by the environ
variable
is controlled by the TIS_NB_INIT_ENV_ELEMENTS
macro.
Its default value is set to 100,
but may be changed by the user (using the -D
option as usual)
to avoid reallocations if the application needs more than 100 variables.
Moreover, to avoid losing precision, a local slevel
is used inside
the implementation of the environment related functions.
It is controlled by the TIS_ENV_INTERNAL_SLEVEL
macro.
Its default value is already very large,
but it can be increased by the user if it is still too low for a specific usage.
Without any option, the analysis stops on recursive function calls
and lets the user decide how to handle them by choosing either
-val-clone-on-recursive-calls
or -val-ignore-recursive-calls
.
The -val-clone-on-recursive-calls
option tells the analyzer
to process the calls to recursive functions
exactly as it they were calls to normal functions.
The function body is copied
and the copy is renamed with the prefix __tis_rec_<n>
where <n>
is the depth of the recursion.
This means that the recursive call is analyzed precisely.
This works up to the limit defined with the option
-val-clone-on-recursive-calls-max-depth
.
When the limit is reached
(or when the -val-clone-on-recursive-calls
is not set),
the contract of the recursive function is used.
Usually an assigns clause is enough to have the expected semantics.
If no contract is provided, the analyzer generates a simple one
but it may be an incorrect contract:
so it is very recommended to provide a contract to analyze such cases.
The -val-ignore-recursive-calls
option tells the analyzer
to use the contract of the function to handle the recursive calls.
The contract may be generated as-if the max-depth option was reached
as explained above.
Note that when the --interpreter
option is used
the -val-clone-on-recursive-calls
is automatically set.
TrustInSoft Analyzer provides two ways to detect memory leaks.
The first way is to use the built-in function tis_check_leak
to print the list of the memory blocks that are allocated
but not referenced by any other memory block anymore
at the program point where the built-in is called.
Example (leak.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdlib.h>
#include <tis_builtin.h>
char * f(int v) {
char *p = malloc(v);
return p;
}
int main() {
f(42);
tis_check_leak();
return 0;
}
|
The program above can be analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations leak.c
we get the following result:
tests/val_examples/leak.c:5:[value] allocating variable __malloc_f_l5 of type char [42]
stack: malloc :: tests/val_examples/leak.c:5 (included from tests/val_examples/leak_test.c) <-
f :: tests/val_examples/leak.c:10 (included from tests/val_examples/leak_test.c) <-
main
tests/val_examples/leak.c:11:[value] warning: memory leak detected for {__malloc_f_l5}
Indeed, 42 bytes are allocated at line 5 in f
,
but since the pointer returned by f
is lost in main
,
a memory leak is detected at line 11.
In addition, the analyzer also prints the list of possibly leaked memory blocks (memory blocks that might not be referenced by any other memory block anymore), as shown in the following example.
Example (leak_weak.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdlib.h>
#include <tis_builtin.h>
char * f() {
char *p = malloc(1);
char *q = malloc(1);
return tis_interval(0, 1) ? p : q;
}
int main() {
char *r = f();
tis_check_leak();
return 0;
}
|
When the program is analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations leak_weak.c
we get the following result:
$ tis-analyzer -val -val-show-allocations leak_weak.c
[...]
leak_weak.c:5:[value] allocating variable __malloc_f_l5 of type char
stack: malloc :: leak_weak.c:5 <- f :: leak_weak.c:11 <- main
leak_weak.c:6:[value] allocating variable __malloc_f_l6 of type char
stack: malloc :: leak_weak.c:6 <- f :: leak_weak.c:11 <- main
[value] using specification for function tis_interval
leak_weak.c:12:[value] warning: possible memory leak detected for {__malloc_f_l5, __malloc_f_l6}
Indeed, when tis_check_leak
is called at line 12, the value of variable
r
is { NULL ; &__malloc_f_l5 ; &__malloc_f_l6 }
, which means the value
of r
can either be NULL
, or the address of the memory block allocated
at line 5, or the address of the memory block allocated at line 6. Thus, the
memory blocks allocated at line 5 and line 6 might be leaked.
In order to improve the precision, the program can be analyzed with the following command line:
$ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
in which case, the analyzer propagates the states where r
points to
NULL
, __malloc_f_l5
and __malloc_f_l6
separately, thus, we get
the following analysis result:
$ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
[...]
leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l5}
leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l6}
It shows that in one path, __malloc_f_l5
is leaked, and in another path
__malloc_f_l6
is leaked.
The second way of detecting memory leaks requires the user to be able to identify two points in the target program such that when the execution reaches the second point, all the memory blocks that have been allocated since the execution was at the first point have been freed. The procedure is then to insert a call to a builtin that lists all the dynamically allocated blocks in each of the two points. If the lists printed by the two builtin calls at the two points match, it means that every block that was allocated after the first point was freed before the second point was reached.
In “interpreter mode”, in which the analyzer follows a single execution path,
the tis_show_allocated
builtin can be used to print the lists of allocated
blocks.
Example (leak_interpreter.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include <stdlib.h>
#include <tis_builtin.h>
void f (void) {
char * p1 = malloc (10);
char * p2 = malloc (10);
char * p3 = malloc (10);
char * p4 = malloc (10);
p1 = p4;
p4 = p2;
free (p2);
free (p1);
}
int main (void) {
char * p = malloc (10);
tis_show_allocated ();
/* all the memory blocks allocated in function f should be freed */
f ();
tis_show_allocated ();
free(p);
}
|
When the program above is analyzed with the following command line:
$ tis-analyzer --interpreter -val leak_interpreter.c
we get the following result:
$ tis-analyzer --interpreter -val leak_interpreter.c
[...]
leak_interpreter.c:16:[value] remaining allocated variables:
__malloc_main_l15
leak_interpreter.c:19:[value] remaining allocated variables:
__malloc_main_l15, __malloc_f_l5, __malloc_f_l7
The second call to tis_show_allocated
at line 19 shows that after the
function call f ()
at line 18, two more memory block respectively
allocated at line 5 and line 7 exist in the memory state since the first
call to tis_show_allocated
. Thus, we know that function f
causes
a memory leak.
In “analyzer mode”, the first point may be visited by the analyzer several
times, for different memory states, corresponding to different execution paths
in the program. The memory states that reach the second point should be
matched to the memory state at the first point that they correspond to.
For this purpose, the tis_allocated_and_id
and tis_id
builtins
can be used. The tis_id
builtin allows to give a unique “id” to each
memory state and tis_allocated_and_id
builtin can be used to print, in
addition to the list of allocated blocks, the value of the “id” so as to
allow states to be identified.
Example (tis_show_allocated_and_id.c):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #include <stdlib.h>
#include <tis_builtin.h>
int main(){
char *t[7];
int n = tis_interval_split(0, 1);
// "before" point
unsigned long long my_id = tis_id();
tis_show_allocated_and_id("before", my_id);
t[n] = malloc(1);
if (!t[n]) goto leave1;
t[n+1] = malloc(1);
if (!t[n+1]) goto leave2;
t[n][0] = 'a';
t[n+1][0] = 'b';
leave2:
free(t[n]);
leave1:
// "after" point
tis_show_allocated_and_id("after", my_id);
}
|
When the program above is analyzed with the following command line:
$ tis-analyzer -val -slevel 10 -val-show-allocations tis_show_allocated_and_id.c
we get the following result:
$ tis-analyzer -val -slevel 100 -val-show-allocations tis_show_allocated_and_id.c
[...]
tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:13:[value] allocating variable __malloc_main_l13 of type char
stack: malloc :: tis_show_allocated_and_id.c:13 <- main
tis_show_allocated_and_id.c:15:[value] allocating variable __malloc_main_l15 of type char
stack: malloc :: tis_show_allocated_and_id.c:15 <- main
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
remaining allocated variables:__malloc_main_l15
tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
remaining allocated variables:__malloc_main_l15
The statement at line 7 causes two separate states in which variable n
respectively has value 0 and 1. Then, the statement at line 10 assigns
variable my_id
to 0 in one state and to 1 in the other state.
The call to tis_show_allocated_and_id("before", "my_id)
at line 11 shows
that in both “before” states where my_id
is 0 and 1, there is no allocated
memory block. The memory allocation statements at line 13 and line 15 cause
three different paths: in one path, the allocation at line 13 fails and
goto leave1
is taken at line 14, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, there is
no allocated memory block in both “after” states; in one path, the allocation
succeeds at line 13, but fails at line 15 and goto leave2
at line 16 is
taken, then, the memory allocated at line 13 is freed at line 20, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, there is
no allocated memory block left in both “after” states; in the other path, both
the allocation at line 13 and at line 15 succeed, but only the memory block
allocated at line 13 is freed at line 20, thus, when
tis_show_allocated_and_id("after", my_id)
at line 24 is reached, the
memory block allocated at line 15 remains in both “after” states, that is a
memory leak problem. From this example, we can also see that there might be
several “after” states for each “before” state: in this case each “after”
state much be checked against its corresponding “before” state for memory
leaks.
TrustInSoft Analyzer faithfully emulates the hardware features of the targeted platform: endianness, size of integer types, and alignment constraints.
The command -machdep help
lists the supported architectures, and the command
-machdep verbose
shows a brief summary of the main characteristics of each supported
architecture.
$ tis-analyzer -machdep help
[kernel] supported machines are: aarch64 aarch64eb apple_ppc_32 arm_eabi armeb_eabi
gcc_aarch64 gcc_aarch64eb gcc_arm_eabi gcc_armeb_eabi gcc_mips_64
gcc_mips_n32 gcc_mips_o32 gcc_mipsel_64 gcc_mipsel_n32 gcc_mipsel_o32
gcc_ppc_32 gcc_ppc_64 gcc_rv32ifdq gcc_rv64ifdq gcc_sparc_32 gcc_sparc_64
gcc_x86_16 gcc_x86_16_huge gcc_x86_32 gcc_x86_64 mips_64 mips_n32 mips_o32
mipsel_64 mipsel_n32 mipsel_o32 ppc_32 ppc_64 rv32ifdq rv64ifdq sparc_32
sparc_64 x86_16 x86_16_huge x86_32 x86_64 x86_win32 x86_win64
(default is x86_32).
These supported platforms are:
ppc_32
, but forcing the “char” type to be signed, as done by
the Apple toolchain (as opposed to the default PowerPC use of unsigned
char type), and allowing gcc language extensions.Unless otherwise specified in the list above, the characteristics of the fundamental types are:
With the exception of apple_ppc_32
, x86_win32
and x86_win64
,
all these machdeps may be specified with
the gcc_
prefix, in which case gcc language extensions are allowed, and
the __int128 integer type is available on 64-bit machdeps.
The endianness of the supported architecture is specified as:
To switch to another architecture quickly, one of the following options may be used:
-16
for gcc_x86_16
-32
for gcc_x86_32
-64
for gcc_x86_64
$ tis-analyzer -64 ...
When the targeted architecture is not supported out-of-the-box, a new architecture corresponding to a specific target may be defined and dynamically loaded by TrustInSoft Analyzer.
A new plug-in is defined with the values of the types for the targeted
architecture. Create a new file custom_machdep.ml
with the
following content and edit the necessary values:
open Cil_types
let mach =
{
version = "foo";
compiler = "bar";
sizeof_short = 2;
(* __SIZEOF_SHORT *)
sizeof_int = 4;
(* __SIZEOF_INT *)
sizeof_long = 4;
(* __SIZEOF_LONG *)
sizeof_longlong = 8;
(* __SIZEOF_LONGLONG *)
sizeof_int128 = 0;
sizeof_ptr = 4;
(* related to __INTPTR_T *)
sizeof_float = 4;
sizeof_double = 8;
sizeof_longdouble = 12;
sizeof_void = 1;
sizeof_fun = 1;
size_t = "unsigned long";
(* __SIZE_T *)
char16_t = "unsigned short";
char32_t = "unsigned int";
wchar_t = "int";
(* __WCHAR_T *)
ptrdiff_t = "int";
(* __PTRDIFF_T *)
max_align_t = "long double";
(* __MAX_ALIGN_T *)
alignof_short = 2;
alignof_int = 4;
alignof_long = 4;
alignof_longlong = 4;
alignof_int128 = 0;
alignof_ptr = 4;
alignof_float = 4;
alignof_double = 4;
alignof_longdouble = 4;
alignof_str = 1;
alignof_fun = 1;
alignof_aligned = 16;
pack_max = 16;
char_is_unsigned = false;
char_bit = 8;
(* __CHAR_BIT *)
const_string_literals = true;
little_endian = false;
(* __TIS_BYTE_ORDER *)
has__builtin_va_list = true;
__thread_is_keyword = true;
has_int128 = false;
}
let () =
Stage.run_after_loading_stage (fun () ->
Core.result "Registering machdep 'mach' as 'custom'";
ignore
(Machdeps.register_machdep
~short_name:"custom"
~cpp_target_options:[]
mach ) )
Define a new header containing the values of the types.
Create a new file __fc_custom_machdep.h
with the following content:
/* skeleton of a real custom machdep header. */
#ifndef __TIS_MACHDEP
#define __TIS_MACHDEP
#ifdef __TIS_MACHDEP_CUSTOM
// __CHAR_UNSIGNED must match mach.char_is_unsigned
#undef __CHAR_UNSIGNED
#define __WORDSIZE 32
// __CHAR_BIT must match mach.char_bit
#define __CHAR_BIT 8
// __SIZEOF_SHORT must match mach.sizeof_short
#define __SIZEOF_SHORT 2
// __SIZEOF_INT must match mach.sizeof_int
#define __SIZEOF_INT 4
// __SIZEOF_LONG must match mach.sizeof_long
#define __SIZEOF_LONG 4
// __SIZEOF_LONGLONG must match mach.sizeof_longlong
#define __SIZEOF_LONGLONG 8
// __TIS_BYTE_ORDER must match mach.little_endian
#define __TIS_BYTE_ORDER __BIG_ENDIAN
#define __TIS_SCHAR_MIN (-128)
#define __TIS_SCHAR_MAX 127
#define __TIS_UCHAR_MAX 255
#define __TIS_SHRT_MIN (-32768)
#define __TIS_SHRT_MAX 32767
#define __TIS_USHRT_MAX 65535
#define __TIS_INT_MIN (-__TIS_INT_MAX - 1)
#define __TIS_INT_MAX 8388607
#define __TIS_UINT_MAX 16777216
#define __TIS_LONG_MIN (-__TIS_LONG_MAX -1L)
#define __TIS_LONG_MAX 2147483647L
#define __TIS_ULONG_MAX 4294967295UL
#define __TIS_LLONG_MIN (-__TIS_LLONG_MAX -1LL)
#define __TIS_LLONG_MAX 9223372036854775807LL
#define __TIS_ULLONG_MAX 18446744073709551615ULL
#define __INT8_T signed char
#define __TIS_INT8_MIN __TIS_SCHAR_MIN
#define __TIS_INT8_MAX __TIS_SCHAR_MAX
#define __UINT8_T unsigned char
#define __TIS_UINT8_MAX __TIS_UCHAR_MAX
#define __INT_LEAST8_T __INT8_T
#define __TIS_INTLEAST8_MIN __TIS_INT8_MIN
#define __TIS_INTLEAST8_MAX __TIS_INT8_MAX
#define __UINT_LEAST8_T __UINT8_T
#define __TIS_UINTLEAST8_MAX __TIS_UINT8_MAX
#define __INT_FAST8_T __INT8_T
#define __TIS_INTFAST8_MIN __TIS_INT8_MIN
#define __TIS_INTFAST8_MAX __TIS_INT8_MAX
#define __UINT_FAST8_T __UINT8_T
#define __TIS_UINTFAST8_MAX __TIS_UINT8_MAX
#define __INT16_T signed short
#define __TIS_INT16_MIN __TIS_SHRT_MIN
#define __TIS_INT16_MAX __TIS_SHRT_MAX
#define __UINT16_T unsigned short
#define __TIS_UINT16_MAX __TIS_USHRT_MAX
#define __INT_LEAST16_T __INT16_T
#define __TIS_INTLEAST16_MIN __TIS_INT16_MIN
#define __TIS_INTLEAST16_MAX __TIS_INT16_MAX
#define __UINT_LEAST16_T __UINT16_T
#define __TIS_UINTLEAST16_MAX __TIS_UINT16_MAX
#define __INT_FAST16_T __INT16_T
#define __TIS_INTFAST16_MIN __TIS_INT16_MIN
#define __TIS_INTFAST16_MAX __TIS_INT16_MAX
#define __UINT_FAST16_T __UINT16_T
#define __TIS_UINTFAST16_MAX __TIS_UINT16_MAX
#define __INT32_T signed int
#define __TIS_INT32_MIN __TIS_INT_MIN
#define __TIS_INT32_MAX __TIS_INT_MAX
#define __UINT32_T unsigned int
#define __TIS_UINT32_MAX __TIS_UINT_MAX
#define __INT_LEAST32_T __INT32_T
#define __TIS_INTLEAST32_MIN __TIS_INT32_MIN
#define __TIS_INTLEAST32_MAX __TIS_INT32_MAX
#define __UINT_LEAST32_T __UINT32_T
#define __TIS_UINTLEAST32_MAX __TIS_UINT32_MAX
#define __INT_FAST32_T __INT32_T
#define __TIS_INTFAST32_MIN __TIS_INT32_MIN
#define __TIS_INTFAST32_MAX __TIS_INT32_MAX
#define __UINT_FAST32_T __UINT32_T
#define __TIS_UINTFAST32_MAX __TIS_UINT32_MAX
#define __INT64_T signed long long
#define __TIS_INT64_MIN __TIS_LLONG_MIN
#define __TIS_INT64_MAX __TIS_LLONG_MAX
#define __UINT64_T unsigned long long
#define __TIS_UINT64_MAX __TIS_ULLONG_MAX
#define __INT_LEAST64_T __INT64_T
#define __TIS_INTLEAST64_MIN __TIS_INT64_MIN
#define __TIS_INTLEAST64_MAX __TIS_INT64_MAX
#define __UINT_LEAST64_T __UINT64_T
#define __TIS_UINTLEAST64_MAX __TIS_UINT64_MAX
#define __INT_FAST64_T __INT64_T
#define __TIS_INTFAST64_MIN __TIS_INT64_MIN
#define __TIS_INTFAST64_MAX __TIS_INT64_MAX
#define __UINT_FAST64_T __UINT64_T
#define __TIS_UINTFAST64_MAX __TIS_UINT64_MAX
#define __INT_MAX_T __INT64_T
#define __TIS_INTMAX_MIN __TIS_INT64_MIN
#define __TIS_INTMAX_MAX __TIS_INT64_MAX
#define __UINT_MAX_T __UINT64_T
#define __TIS_UINTMAX_MAX __TIS_UINT64_MAX
// __INTPTR_T must match mach.sizeof_ptr
#define __INTPTR_T __INT32_T
#define __TIS_INTPTR_MIN __TIS_INT32_MIN
#define __TIS_INTPTR_MAX __TIS_INT32_MAX
#define __UINTPTR_T __UINT32_T
#define __TIS_UINTPTR_MAX __TIS_UINT32_MAX
// __PTRDIFF_T must match mach.ptrdiff_t
#define __PTRDIFF_T int
// __MAX_ALIGN_T must match mach.max_align_t
#define __MAX_ALIGN_T long double
// __SIZE_T must match mach.size_t
#define __SIZE_T unsigned int
#define __SSIZE_T int
#define __TIS_PTRDIFF_MIN __TIS_INT_MIN
#define __TIS_PTRDIFF_MAX __TIS_INT_MAX
#define __TIS_SIZE_MAX __TIS_UINT_MAX
#define __TIS_SSIZE_MAX __TIS_INT_MAX
// __WCHAR_T must match mach.wchar_t
#define __WCHAR_T int
#define __TIS_WCHAR_MIN __TIS_INT_MIN
#define __TIS_WCHAR_MAX __TIS_INT_MAX
#define __WINT_T long long int
#define __TIS_WINT_MIN __TIS_LLONG_MIN
#define __TIS_WINT_MAX __TIS_LLONG_MAX
#define __WCTRANS_T long long int
#define __WCTYPE_T long long int
#define __SIG_ATOMIC_T volatile int
#define __TIS_SIG_ATOMIC_MIN __TIS_INT_MIN
#define __TIS_SIG_ATOMIC_MAX __TIS_INT_MAX
// Common machine specific values (PATH_MAX, errno values, etc) for Linux
// platforms, usually applicable anywhere else.
#include "__fc_machdep_linux_gcc_shared.h"
#else
#error "I'm supposed to be included with __TIS_MACHDEP_CUSTOM macro defined"
#endif
#endif
NB: The previous content is not close to a real architecture but is given as an example of possibilities.
Warning
It is important for the data defined in the two files above to have compatible values.
The new architecture may now be tested:
#include "limits.h"
int main(void)
{
return INT_MAX;
}
To analyze it with TrustInSoft Analyzer, load the plug-in that defines
the custom machdep and then add the option -D __TIS_MACHDEP_CUSTOM
.
$ tis-analyzer -load-script custom_machdep.ml -I . -D __TIS_MACHDEP_CUSTOM -machdep custom -val test.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
__retres ∈ {8388607}
The command-line options -all-rounding-modes-constants
and -all-rounding-modes
change the interpretation
of floating-point constants and computations.
The default, when both options are unset, is to assume a strict IEEE 754
platform where float
is mapped to the IEEE 754 binary32 format,
double` is mapped to binary64,
and the rounding mode is never changed from its nearest-even default.
The following deviations from strict IEEE 754 behavior can be taken into account
in TrustInSoft Analyzer:
FLT_EVAL_METHOD
is defined to a value other than
0 by the compiler.FLT_EVAL_METHOD
to 2 or 0, but actually produce floating-point results
inconsistent with these settings.#pragma STDC
FP_CONTRACT ON
. Some C compilers are taking the path of enabling this
by default (ref: https://reviews.llvm.org/D24481 ).The user who applies TrustInSoft Analyzer to C programs containing significant floating-point computations is invited to open a ticket in the support site with details of the compiler and architecture.
An unfortunate choice in the C89 standard has led, when the C99 standard was published, to incompatibilities for the types assigned to integer constants. The problem is compounded by some compilers’ eagerness to provide extended integer types. The type of integer constants can, though the usual arithmetic conversions, influence the results of integer computations.
By default, TrustInSoft Analyzer types integer constants the same way a
C99-compliant compiler would, and invites the user to pick a choice
if a signed integer constant that cannot be represented as a long
long
is encountered in the target program. Since long long
is
at least 64-bit, this does not happen unless an integer constant in
the program is larger than 9,223,372,036,854,775,807.
The command-line option -integer-constants c89strict
can be used
to select the C89 behavior. On an ILP32 compiler following the C89
standard, the following program returns 1
with res = 1
,
whereas it returns 2
with res = 2
when compiled with a C99
compiler:
int main(void)
{
int res;
if (-3000000000 < 0)
res = 1;
else
res = 2;
return res;
}
$ tis-analyzer -val integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
res ∈ {1}
$ tis-analyzer -val -integer-constant c89strict integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
res ∈ {2}
Caution
The -val-use-spec
option is only available in the commercial version
of TrustInSoft Analyzer.
Whenever it is necessary to analyze assembly code without using the
-val-use-spec
option to provide its specification, model the assembly
code with equivalent C code instead.
Assembly code is accepted by the tool, but most of the analyzers are not able to understand it.
Assembly statements appear in the statements.csv
information file (see
Information about the Statements) with the keyword asm
in the kind
column.
It can be the case that these statements are not reachable in the studied
context, so they can be safely ignored. If they are reachable, the value
analysis ignores them, but it warns when it has to do so with the message:
[value] warning: assuming assembly code has no effects in function xxx
It is fine if the assembly statements have no effect, or if the effect can be ignored, but it is most likely not the case!
The simplest way to handle assembly code in the value analysis is to
write a specification for the functions that enclose such statements and
use the option -val-use-spec
to use them instead of the function
bodies.
If a function is composed of other statements besides the assembly ones, WP can be used to check the specification. Because WP does not understand assembler either, the assembly statements have to be specified using a statement contract. These cannot be verified against assembler, but are used to verify the rest of the function.
Example:
//@ requires pre: n >= 0; ensures post: \result == n * 3 * x + 10;
int asm_fun(int x, int n)
{
int res = 0;
/*@ loop assigns i, res;
loop invariant l1: i <= n;
loop invariant l2: res == i * 3 * x;
*/
for (int i = 0; i < n; i++) {
int y;
//@ assigns y; ensures y == 3 * x;
asm ("leal (%1,%1,2), %0" // x + x * 2 --> == 3 * x
: "=r" (y)
: "r" (x)
);
res += y;
}
return res + 10;
}
The specification of asm_fun
has to be used during value analysis
using the -val-use-spec
options. But this specification can be
verified using WP assuming the assembler statement properties are
correct, which is a smaller hypothesis.
Caution
The -absolute-valid-range
option is only available in the commercial
version of TrustInSoft Analyzer.
In general, there is an expectation that C programs exhibit behaviors that are independent from the values of addresses at which variables are located in memory. However, embedded code, and other code interfacing with hardware, routinely accesses registers, physical memory, and peripheral devices by interacting with specific memory addresses. These addresses are fixed and dictated by hardware architecture, so programs access them either directly by reading or writing memory at an absolute address, or by having a linker pin specific variables to specific memory addresses.
The requirement to inspect addresses is not limited to interfacing with peripherals either. By default, the address size of a variable depends on the architecture for which the program is compiled and on which it runs. Programmers sometimes take advantage of specific assumptions about the values of addresses to optimize operations on pointers. For example, code may assume that addresses of variables or functions are allocated within the first 4GiB of available memory in practice, and use this knowledge to store the addresses of such entities as 32-bit, not 64-bit integers. It is also often the case that code expects objects in memory to adhere to a specific alignment and relies on that fact to perform optimizations such as pointer tagging.
TrustInSoft Analyzer generally assumes that each variable is located at one out of all possible addresses in memory, but does not assume any specific address, and considers dereferencing absolute addresses to be invalid. In order to work with code that relies on absolute memory addresses, the user must provide additionally configuration to the analyzer. This guide describes how to perform such configuration:
A programmer can make a mistake, where they treat a numerical value as an address and dereference it, trying to access its contents. The address may be invalid and cause an access violation. But even if the access does not trap, it could be a mistake. The analyzer should warn about it and the programmer should rectify it.
On the other hand, programs interfacing with hardware commonly access its resources though memory-mapped I/O (MMIO), where the program exchanges information with the hardware in question through reading from and writing to specific addresses in memory. In such a case, the programmer may use an absolute address directly. For example:
#define HW_REGISTER 0x4000
void main() {
printf("contents: %x", *((unsigned int *) HW_REGISTER));
}
The value at such an address is managed in some way by hardware and the programmer knows it is safe to access it on that basis. So, when the address is dereferenced, this is both deliberate and safe.
Whether the access through an absolute address is purposeful or accidental, they appear to be the same from the point of view of TrustInSoft Analyzer. So, conservatively, it defaults to treating both as invalid operations and emitting an alarm (typically, a memory access alarm when dereferenced or a a pointer arithmetic alarm when indexed).
This section shows how to tell the analyzer that accesses to a specific addresses are purposeful and should be treated as valid operations.
One way of doing this is to define a range of valid addresses, allowing those addresses to be dereferenced using their raw numerical values. Alternatively, it is possible to provide variable declarations and use them instead of the absolute address directly during the analysis, or to place these variables at specific addresses, so that accesses to the absolute address correspond to accesses to that variable. These three techniques are summarized in the following table and discussed in detail below.
define absolute valid range | introduce variables (unconstrained address) | introduce variables (constrained address) | |
---|---|---|---|
preserves specific address | yes | no | yes |
requires code modification | no | yes | no |
allows disjoint ranges of valid addresses | no | yes | yes |
allows designating as read-only | no | yes | yes |
allows dereferencing absolute address | yes | no | yes |
volatile behavior granularity | entire valid range | per variable | per variable |
Example The following program shows the basic problem that these techniques
solve. It portrays an example use of absolute addresses to communicate with a
peripheral device via a set of registers. The GPIO_MODE
register is located
at the absolute address 0x40020000
and is used as an unsigned integer (via
the macro VALUE
). It is used to set the operating mode of the device. Then,
GPIO_DATA_A
, GPIO_DATA_B
, and GPIO_DATA_C
represent a set of ports,
each accessed as a 4-byte array (via the macro BYTE
with an index
) and
found in memory at addresses 0x40020014
, 0x40020018
, and 0x4002001c
,
respectively. When the program is executed, it sets the value of GPIO_MODE
to the value `0x19
, then reads the values of four bytes from GPIO_DATA_A
and, depending on each read values, writes either 1
or 0
to the
corresponding byte in GPIO_DATA_B
. Before finishing, the program calls
tis_show_each
to display the absolute addresses represented by each
constant, and the analyzer’s view of the values at each of these addresses.
#include <tis_builtin.h>
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i < 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_B);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
When analyzing this program, the analyzer emits an alarm reporting an invalid memory access. Without the context of the peripheral device, writing to unallocated memory is perceived as an undefined behavior by the analyzer.
$ tis-analyzer physical_register_abs.c -quiet -print -print-filter main
¶void main(void)
{
/*@ assert Value: mem_access: \valid((unsigned int *)0x40020000); */
*((unsigned int *)0x40020000) = (unsigned int)0x19;
{
int i;
i = 0;
while (i < 4) {
if ((int)*((unsigned char *)0x40020014 + i) == 0) *((unsigned char *)0x40020018 + i) = (unsigned char)1;
else *((unsigned char *)0x40020018 + i) = (unsigned char)0;
i ++;
}
}
tis_show_each("MODE", 0x40020000, *((unsigned int *)0x40020000));
tis_show_each("DATA_A", 0x40020018);
tis_show_each("*DATA_A", (int)*((unsigned char *)0x40020014 + 0),
(int)*((unsigned char *)0x40020014 + 1),
(int)*((unsigned char *)0x40020014 + 2),
(int)*((unsigned char *)0x40020014 + 3));
tis_show_each("DATA_B", 0x40020018);
tis_show_each("*DATA_B", (int)*((unsigned char *)0x40020018 + 0),
(int)*((unsigned char *)0x40020018 + 1),
(int)*((unsigned char *)0x40020018 + 2),
(int)*((unsigned char *)0x40020018 + 3));
tis_show_each("DATA_C", 0x4002001c);
tis_show_each("*DATA_C", (int)*((unsigned char *)0x4002001c + 0),
(int)*((unsigned char *)0x4002001c + 1),
(int)*((unsigned char *)0x4002001c + 2),
(int)*((unsigned char *)0x4002001c + 3));
__tis_globfini();
return;
}
The user can configure the analyzer to treat a range of addresses as valid by
using its absolute-valid-range
option. Then, the analyzer considers all
accesses to those addresses as valid. The user can use this option to specify a
single range of addresses representing one or more contiguous logical objects.
The user can specify a valid range via the command-line option
-absolute-valid-range. The option takes a single
argument consisting of two addresses separated by a dash, here represented by
FIRST
and LAST
:
$ tis-analyzer -absolute-valid-range FIRST-LAST …
Alternatively, the user can specify the same option within a JSON analysis
configuration file using absolute-valid-range. The
option’s argument is a string that contains two addresses separated by a dash
(FIRST
and LAST
)
{
"val": true,
"absolute-valid-range": "FIRST-LAST"
}
Both the FIRST
and LAST
addresses are integer values expressed as either
hexadecimal, octal, binary, or decimal numbers in C literal notation. Here, the
command line options specify a valid range from 0x4000
to 0x4007
using
all available notations:
$ tis-analyzer -absolute-valid-range 0x4000-0x4007 …
$ tis-analyzer -absolute-valid-range 0X4000-0X4007 …
$ tis-analyzer -absolute-valid-range 0o40007-0o40007 …
$ tis-analyzer -absolute-valid-range 0O40007-0O40007 …
$ tis-analyzer -absolute-valid-range 0b100000000000000-0b100000000000111 …
$ tis-analyzer -absolute-valid-range 0B100000000000000-0B100000000000111 …
$ tis-analyzer -absolute-valid-range 16391-16398 …
Warning
Octal number notation
The absolute-valid-range
option uses a different notation than C when
expressing octal numbers. The analyzer interprets addresses passed to
absolute-valid-range
prefixed with only a leading zero as decimal.
The range is inclusive, meaning that both FIRST
and LAST
addresses are
considered valid. This means that the notation above specifies the following
eight valid addresses: 0x4000
, 0x4001
, 0x4002
, 0x4003
,
0x4004
, 0x4005
, 0x4006
, and 0x4007
. If the value of LAST
is
less than the value of FIRST
, the analyzer sets the range of absolute
addresses to be empty.
There can only be a single contiguous valid range defined. If the user sets
multiple absolute-valid-range
options, the analyzer uses only the last
(rightmost) one.
Tip
Introspecting addresses
TrustInSoft Analyzer always attempts to display addresses of variables as symbols referring to the variable. E.g.:
int v;
tis_show_each("&v", &v);
[value] Called tis_show_each({{ "&v" }}, {{ &v }})
When dealing with constrained variables this representation might not always
be useful. So, the analyzer provides the builtin
tis_force_ival_representation
, which coerces the symbolic representation
of the address into its numerical value, here, any possible 4-byte aligned
32-bit pointer value:
int v;
tis_show_each("&v", tis_force_ival_representation(&v));
[value] Called tis_show_each({{ "&v" }}, [4..4294967288],0%4)
Instead of using tis_force_ival_representation
as an argument to
tis_show_each
, the user can also use the function
tis_show_ival_representation
which combines both: it displays values the
same as tis_show_each
but it applies tis_force_ival_representation
to
each of its arguments.
int v;
tis_show_ival_representation("&v", &v);
[value] Called tis_show_ival_representation({{ "&v" }}, [4..4294967288],0%4)
The tis_show_each
builtin displays integers in decimal representation by
default. However, it is convenient to have addresses displayed using
hexadecimal representation instead. The user can do this by setting the
big-ints-hex configuration option to specify that
the analyzer should display values larger than the given threshold using
hexadecimal notation. The option can be set via the command-line:
$ tis-analyzer -val -big-ints-hex 0xff …
Or through a JSON configuration file:
{
"val": true,
"big-ints-hex": "0xff"
}
Run with the configuration set like the above, the following example prints addresses as hexadecimal values:
int v;
tis_show_ival_representation("&v", &v);
[value] Called tis_show_ival_representation({{ "&v" }}, [4..0xFFFFFFF8],0%4)
Alternatively, it may sometimes be convenient to use printf
to display
addresses using a specific representation:
int v;
printf("&v = 0x%lx", tis_force_ival_representation(&v));
Using printf
is only possible if the address resolves to a single,
precise value. Otherwise, the analyzer issues a warning and prints an empty
string:
[value] warning: using address as integer in printing function. This may cause
the program to behave nondeterministically when executed
&v = 0x
However, printf
can be useful with absolute and constrained address values:
int *ptr = 0x8000;
printf("ptr = 0x%lx", tis_force_ival_representation(ptr));
ptr = 0x8000
Example The example at the start of the
section assumes that the
data at absolute addresses GPIO_MODE
(0x40020000-0x40020003
),
GPIO_DATA_A
(0x40020014-0x40020017
), GPIO_DATA_B
(0x40020018-0x4002001b
), and GPIO_DATA_C
(0x4002001c-0x4002001f
) can
be safely accessed. The user can convey this to the analyzer by specifying a
valid address range from 0x40020000
to 0x4002001f
via the
absolute-valid-range
option. Then, the analyzer manages to complete the
analysis. The analyzer runs with the slevel
of 10 to conveniently simplify
the values displayed by tis_show_each
and with big-ints-hex
set to
display all addresses in the program in hexadecimal representation (see
introspecting addresses). When run like that, the
analyzer deduces the following values at the absolute memory locations:
$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
¶[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
The analyzer initially assumes that memory at absolute addresses contains some unknown value. The memory at those addresses will retain values written to it by the analyzed program and not change independently. However, if the addresses represent input ports that will be written to by a peripheral device, their contents might indeed change independently of what the source code of the analyzed program suggests.
The analyzer does not model this by default. Instead, the user must specify that the valid address range is volatile via the volatile-globals option. This topic is covered in detail in the guide to volatile variables.
Warning
If the value pointed to by an absolute address represents data that can be modified by a peripheral, it should be analyzed as volatile to preserve soundness. See Volatile variables for details.
Example Consider the example at the start of the section again. When the code is
executed with a specified valid range of addresses, the analyzer shows that the
memory at address GPIO_DATA_B
retains the value that the program wrote to
it—either 0
or 1
.
$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
¶[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
However, if GPIO_DATA_B
is a hardware port, the expectation is that the
values will be consumed or otherwise modified by the peripheral and
independently from the code of the program. This is indicated by specifying the
option -volatile-globals
with the argument NULL
(indicating the entire
range of valid absolute addresses, see Volatile variables). At this point,
the analyzer will assume that the values at absolute addresses cannot be
determined solely by observing the behavior of the program. Therefore, it
reports that the value at GPIO_DATA_B
can contain any possible value within
the range allowed by its type, even after it was written to.
$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f -volatile-globals NULL
¶[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, [0..0xFFFFFFFF])
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
Note, however, that since the analyzer treats the entire valid range of absolute
addresses as volatile, it reports that GPIO_MODE
can contain any possible
value as well. Currently, there is no mechanism to specify only parts of the
valid absolute range to act as volatile using the absolute-valid-range
option. If more fine-grained specification is required, the user is advised to
replace absolute addresses with equivalent variable declarations, as described
in detail in a separate section
below.
The analyzer only allows for the definition of a single contiguous absolute valid range. If a code base operates on a peripheral with multiple separate registers, they can be logically represented within a single absolute valid range, as long as they constitute a contiguous memory region.
For example, if a program communicates with a peripheral via two 4-byte ports,
one available at address 0x4000
and the other at 0x4004
, the user can
specify an absolute valid range from 0x4000
to 0x4007
that encompasses
both ports.
However, if a program communicates with a peripheral through a 4-byte port
address 0x4000
and a 2-byte port at address 0x4006
, but the memory
addresses 0x4004
and 0x4005
are still invalid, the user cannot specify
an absolute valid range that encompasses both ports.
If such absolute valid range were to be defined, the analyzer is not capable of distinguishing between accesses within the logical objects that can be safely accessed and the gap memory between them that should not be. Since this area of memory is defined as valid, accesses into the gap will also be treated as valid.
If the code requires a discontinuous area of valid addresses, the user is advised to replace absolute addresses with equivalent variable declarations, as described in detail in a separate section below.
Even if the objects within an absolute valid range are contiguous, the user should be aware that the analyzer cannot detect the boundaries between them. Since these logical objects are not expressed in concrete terms (e.g., as variables), the assumptions about their type and size remains implicit.
That is, if the program attempts to access one of such object at an offset that is out of bounds of that object, the analyzer will not emit an alarm, so long as the offset falls within the boundaries of the valid range.
Compare this with a situation where instead of an absolute address range, the memory is described as variables. In such a case, the analyzer is aware of the size of the variables at the base address, so it can determine whether an offset falls outside of the variable and emit an alarm if it does.
Indeed, the user can avoid this pitfall by replacing the absolute valid range with a series of variables using the technique presented in the next section.
Example Consider a program analogous to the previous
example, where absolute
addresses represent ports or registers of a peripheral device: GPIO_MODE
(0x40020000-0x40020003
), GPIO_DATA_A
(0x40020014-0x40020017
),
GPIO_DATA_B
(0x40020018-0x4002001b
), and GPIO_DATA_C
(0x4002001c-0x4002001f
) and can be safely accessed. However, here the code
is modified to index GPIO_DATA_A
and GPIO_DATA_B
out of their respective
presumed bounds.
#include <tis_builtin.h>
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i <= 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_A);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
When analyzed with a correctly configured absolute valid range, the example does
not cause an alarm to be emitted. Instead, the analyzer has no choice but to
assume that all accesses to valid absolute addresses were valid and purposeful.
However, as a result, the writes to GPIO_DATA_B
at index 5
actually
write to GPIO_DATA_C
at index 0
, which is suspect.
$ tis-analyzer physical_register_oob.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
¶[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020014})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, {0; 1}, [0..255], [0..255], [0..255])
A better way of informing the analyzer that accesses to absolute memory addresses are valid is to declare a set of variables equivalent to the data represented by such absolute addresses and to place those variables at those exact absolute addresses. This is an efficient way of concretizing how each of the memory addresses should be interpreted for the purposes of analysis.
The least invasive way to inject such equivalent variables into the analyzed program is to introduce them in an auxiliary file containing their definitions and to provide that file to the analyzer as one of the analyzed input files. If the variables are constrained to the same absolute memory addresses as used by the original program, the analyzer treats the accesses using these raw address as accesses to the corresponding variables. This allows the analyzer to distinguish whether a given access through an absolute address is purposeful or invalid. In addition, the analyzer can also then check if the address is interpreted in accordance with the type of the declared variables.
For example, the ADDR
macro in the following code might represent some
number of consecutive bytes of data, whose length is expressed by ADDR_SZ
.
#define ADDR_SZ 8
#define ADDR 0x4000
The expectation is that this address is therefore used as if it were an
8-element array of bytes. In that case, it can be expressed as a variable, as an
eight-element array containing elements of type unsigned char
. The variable
is qualified as extern
to indicate that it is not initialized within this
program:
extern unsigned char byte_array[ADDR_SZ];
The analyzer allows for variables to be pinned to specific addresses (e.g., to
model the behavior of a linker script). The user can do so by attaching the
tis_address
attribute to a variable and by specifying an address via the
attribute’s argument. For the example above, pinning byte_array
to the
address defined through ADDR
, is done as follows:
extern unsigned char byte_array[ADDR_SZ] __attribute__(tis_address(ADDR));
When the code of the program is analyzed and the contents of the address
0x4000
represented by ADDR
are dereferenced, the analyzer accesses
byte_array
. This means that the dereferences found in the original program do not have
to be modified to take advantage of the defined variable.
Since the analyzer now associates the addresses from 0x4000
to 0x4007
with byte_array
, it also ascribes byte_array
’s type to them. This means
it enforces byte_array
’s boundaries when the memory at 0x4000
-0x4007
is accessed and emits and alarm if it detects a violation (regardless of whether
the neighboring addresses are valid). The analyzer also respects the variable’s
type qualifiers, like volatile
or const
, when accessing the associated
addresses.
Tip
Volatile variables
Variables may be declared and subsequently analyzed as volatile
to
indicate that their values may change independently from the analyzed
program, for instance, by a peripheral device. Similarly, variables may be
declared as const volatile
if they may only be read, but their values may
change from read to read. See Volatile variables for more detail.
This variable definition can be placed in a separate file from the one
containing the definition of ADDR
. This means the analysis can be performed
without modifying the original code base at all.
Tip
The tis_address
attribute
This section uses tis_address
to attach each variable to a single,
concrete address. However, the attribute has other capabilities, including
the ability to assign an address from a range or from a named memory region, as
well as specifying alignments.
int a __attribute_(("[0x4000-0x4007]"));
int b __attribute_(("BCODE"));
int c __attribute_(("[0x4000-0x4010,0%4]"));
The attribute is also not limited to the application described here. The user
can also use it to constrain any declared variable or function to a memory
location or range of memory locations. The complete description of the
tis_address
can be found in a dedicated section of this guide,
below.
Example Consider again the example from the head of the section where absolute addresses
represent ports or registers of a peripheral device: GPIO_MODE
(0x40020000-0x40020003
), GPIO_DATA_A
(0x40020014-0x40020017
),
GPIO_DATA_B
(0x40020018-0x4002001b
), and GPIO_DATA_C
(0x4002001c-0x4002001f
) and can be safely accessed.
#include <tis_builtin.h>
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i < 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_B);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
In order to provide the analyzer with the information that accesses to these
addresses are safe, the user creates a new source file
physical_register_defs.c
and defines a variable representing each of the
absolute addresses used in the program: gpio_mode
for GPIO_MODE
,
gpio_data_a
for GPIO_DATA_A
, etc. Since GPIO_MODE
is used as a
numerical value and stretches from 0x40020000
0x40020003
, it is declared
as unsigned int
. The remaining variables are used as 4-element byte arrays,
so they are all declared as 4-element arrays with elements of type unsigned
char
. Finally, each of the variables is pinned to the start address they are
associated with by way of tis_address
(the end address is inferred from
their types).
#include <tis_builtin.h>
extern unsigned int gpio_mode __attribute__((tis_address(0x40020000)));
extern unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
extern unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
extern unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));
The analyzer can now conduct the analysis of the original program, using the
definitions in the auxiliary file to inform its decision about validity of
accesses to the memory locations by absolute addresses. In effect, the analysis
finishes and produces the expected output, and does so without the need for any
modifications to the original code base. The example is executed with
big-ints-hex
to display addresses using hexadecimal representation (see
introspecting addresses).
$ tis-analyzer physical_register_abs.c physical_register_defs.c -val -slevel 10 -big-ints-hex 0x40000000
¶[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
We recommend this approach rather than using the absolute-valid-range option, because the
additional information about the expected sizes, value ranges, and type
qualifiers (like const
) for the values found in memory at those addresses
guards the user against the pitfalls of the valid-address-range
option:
Specifying discrete variables instead of a single area of memory allows the objects to be discontinuous (vs. Pitfall: multiple logical objects). It also instructs the analyzer how to distinguish between logical objects, allowing the analyzer to catch boundary violations (vs. Pitfall: boundaries of logical objects).
The type system can also be used to declare variables as volatile
allowing
the analyzer to account for the capability of peripheral devices to update their
values (vs. Pitfall: external modifications to memory). Specifically,
declaring variables to represent specific objects in memory allows the analyzer
to model volatility with variable-granularity and even to model the behavior of
volatile variables in detail using the Volatile plugin of the analyzer.
Example Consider again the example from Pitfall: boundaries of logical objects where GPIO_DATA_A
and
GPIO_DATA_B
index out of their respective bounds:
#include <tis_builtin.h>
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i <= 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_A);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
However, here, instead of declaring valid memory via the
absolute-valid-range
option, there is a file containing the definitions of
variables representing the logical objects at GPIO_DATA_A
and
GPIO_DATA_B
, etc.:
#include <tis_builtin.h>
extern unsigned int gpio_mode __attribute__((tis_address(0x40020000)));
extern unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
extern unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
extern unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));
Then, when analyzing the example with both files, the analyzer emits an alarm
when the index if out of bounds of GPIO_DATA_A
. This is because the variable
declaration specifies for the analyzer what those bounds are in precise terms.
$ tis-analyzer physical_register_oob.c physical_register_defs.c -val -slevel 10 -big-ints-hex 0x40000000
¶tests/tis-user-guide/physical_register_oob.c:15:[kernel] warning: out of bounds read. assert \valid_read((unsigned char *)0x40020014+i);
While providing definitions of variables describing the logical objects pointed to by absolute addresses does not require modifying the analyzed source code, it is sometimes convenient to place the definition of the variables alongside an original definition of the absolute address, such as a macro. In such cases, the definitions of the variables describing the underlying logical objects may be inserted directly into the source code of the program.
For example, given an absolute address representing an 8-element array of bytes
might be declared as the ADDR
macro:
#define ADDR_SZ 8
#define ADDR 0x4000
Using the suggested the technique, the code is modified to provide a variant
that interprets this memory area as the variable byte_array
:
#define ADDR_SZ 8
#define ADDR 0x4000
#ifdef __TRUSTINSOFT_ANALYZER__
unsigned char byte_array[ADDR_SZ];
#endif
To keep these replacement declarations from interfering with the original code,
it is recommended to define them conditionally, guarded by the
__TRUSTINSOFT_ANALYZER__
macro. The analyzer defines the
macro while parsing code in preparation for analysis. If the macro is undefined,
the program includes only the original code that uses raw absolute addresses. If
the macro is defined, the program includes the declarations of equivalent
variables.
Tip
Use the tis-modifications tool to check that
all analysis-related code modifications are locked behind the
__TRUSTINSOFT_ANALYZER__
macro.
When the code of a program is subsequently analyzed, the address 0x4000
corresponds to the address of byte_array
, allowing the analyzer to determine
that the access is valid, as well as determining whether the data located there
is used in accordance with the prescribed type.
Example The following example modifies the example from the head of
the section so that the
registers of a peripheral device are backed by variables for the purposes of
analysis. GPIO_MODE
covers gpio_mode
, GPIO_DATA_A
covers
gpio_data_a
, etc. In order to align the variables with the absolute
addresses, they are fixed at specific positions via the tis_address
attribute: gpio_mode
is pinned to 0x40020000
, gpio_data_a
is pinned
to 0x40020014
, gpio_data_b
to 0x40020018
, and gpio_data_c
to
0x4002001c
(with their extents being derived from their types).
#include <tis_builtin.h>
#include <stdint.h>
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#ifdef __TRUSTINSOFT_ANALYZER__
unsigned int gpio_mode __attribute__((tis_address(0x40020000)));
unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));
#endif
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i < 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_A);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
Upon analysis, the analyzer produces the expected result, without emitting
alarms. The analysis is set to run with slevel
of 10 to conveniently
simplify the values displayed by tis_show_each
and with big-ints-hex
set
to display all addresses in the program using hexadecimal representation (see
introspecting addresses).
$ tis-analyzer physical_register_var_addr.c -val -slevel 10 -big-ints-hex 0x40000000
¶[value] Called tis_show_each({{ "MODE" }}, {1073872896}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {1073872916})
[value] Called tis_show_each({{ "*DATA_A" }}, {0}, {0}, {0}, {0})
[value] Called tis_show_each({{ "DATA_B" }}, {1073872920})
[value] Called tis_show_each({{ "*DATA_B" }}, {1}, {1}, {1}, {1})
[value] Called tis_show_each({{ "DATA_C" }}, {1073872924})
[value] Called tis_show_each({{ "*DATA_C" }}, {0}, {0}, {0}, {0})
While the previous technique associated variables with specific absolute addresses, it may sometimes be advantageous to remove the dependency on absolute addresses whatsoever. If an absolute address is defined via some macro, the user probably expects that the absolute address is always accessed through that macro. This technique causes the analyzer to detect stray attempts at accesses to the absolute address in the code. Outside of that, this technique carries the same advantages as the technique in the previous section, but requires the user to modify the source code of the analyzed program.
Consider an absolute address used via a macro. The address represents an
8-element array of bytes declared as the ADDR
macro, with its
size defined by ADDR_SZ
:
#define ADDR_SZ 8
#define ADDR 0x4000
The values at ADDR
are always meant to be accessed via the macro. Whereas an
access that uses the value of the absolute address directly is potentially a
mistake:
char byte0 = *((char *) ADDR + 0);
char byte1 = *((char *) ADDR + 1);
char byte2 = *((char *) 0x4002);
char byte3 = *((char *) ADDR + 3);
As with the technique above,
the user can replace the absolute address with operations that dereference
pointers to an unconstrained variables representing those absolute addresses.
The code is modified to provide a variant that interprets this memory area as
the variable byte_array
. However, here it is not constrained to a particular
address. In addition, the modification also encompasses ADDR
itself, which
is defined as a pointer to the first element of byte_array
for the duration
of the analysis.
#define ADDR_SZ 8
#ifdef __TRUSTINSOFT_ANALYZER__
unsigned char byte_array[ADDR_SZ];
#define ADDR byte_array
#else
#define ADDR 0x4000
#endif
The modified code is compatible with the original version, in that there are no
changes in how the code appears when compiled and executed, and when the code is
being analyzed. Both ADDR
and ADDR_SZ
remain accessible and used for
interacting with the memory in either case.
Warning
It may be the case that the macro used for dereferencing an absolute address is also used for other purposes, such as to code generation through token concatenation.
Since the __TRUSTINSOFT__ANALYZER__
variant of the code replaces the
literal used in the macro with a variable name, the modification has the
potential to impact the execution of the program, so it should be used with
care.
This technique of replacing absolute addresses with variable references works best if the absolute address is behind a macro or similar. However, it is also possible to apply it without the macro by replacing individual uses of absolute addresses with variables directly. The procedure of doing so typically time consuming, but not error prone, since the analyzer will raise an alarm informing of an invalid memory access whenever a stray absolute address is accessed in the code (excluding dead code).
Example Consider the following code. It reprises the example from the
head of the section where
absolute addresses represent ports or registers of a peripheral device:
GPIO_MODE
(0x40020000-0x40020003
), GPIO_DATA_A
(0x40020014-0x40020017
), GPIO_DATA_B
(0x40020018-0x4002001b
), and
GPIO_DATA_C
(0x4002001c-0x4002001f
) and can be safely accessed. However,
here, the information that these addresses are valid is conveyed to the analyzer
in situ. Each of these constants is defined as a pointer to an associated
external variable that represents the programmer’s interpretation of how the
data should be accessed. The variable definitions are guarded by a macro,
meaning that these variables are only present during analysis by TrustInSoft
analyzer.
#include <tis_builtin.h>
#ifdef __TRUSTINSOFT_ANALYZER__
extern unsigned int gpio_mode;
extern unsigned char gpio_data_a[4];
extern unsigned char gpio_data_b[4];
extern unsigned char gpio_data_c[4];
#define GPIO_MODE &gpio_mode
#define GPIO_DATA_A ((unsigned char *) gpio_data_a)
#define GPIO_DATA_B ((unsigned char *) gpio_data_b)
#define GPIO_DATA_C ((unsigned char *) gpio_data_c)
#else
#define GPIO_MODE 0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c
#endif
#define VALUE(reg) *((unsigned int *) reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])
void main(void) {
VALUE(GPIO_MODE) = 0x19;
for (int i = 0; i < 4; i++) {
if (BYTE(GPIO_DATA_A, i) == 0) {
BYTE(GPIO_DATA_B, i) = 1;
} else {
BYTE(GPIO_DATA_B, i) = 0;
}
}
tis_show_each("MODE", GPIO_MODE, VALUE(GPIO_MODE));
tis_show_each("DATA_A", GPIO_DATA_A);
tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
tis_show_each("DATA_B", GPIO_DATA_B);
tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
tis_show_each("DATA_C", GPIO_DATA_C);
tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}
This program can be analyzed without producing errors (and without the need to
configure valid memory through absolute-valid-range). Running the analysis
with slevel
set to 10 produces the expected results shown below. Note that
the addresses of GPIO_MODE
, GPIO_DATA_A
, GPIO_DATA_B
, and
GPIO_DATA_C
are no longer displayed as absolute values, but they are
presented in reference to the variables that define them. Internally the
analyzer assumes that these variables can be located at any possible memory
address. Note also that since the variables are only declared (and not defined),
the analyzer does not specify their initial contents.
$ tis-analyzer physical_register_var.c -val -slevel 10 -big-ints-hex 0x40000000
¶[value] Called tis_show_each({{ "MODE" }}, {{ &gpio_mode }}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {{ &gpio_data_a }})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {{ &gpio_data_b }})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {{ &gpio_data_c }})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
Typically, the behavior of a program should be independent of the addresses of variables defined within it. Nevertheless, some programs do require that specific variables be located at specific addresses in memory. For instance, to represent specific hardware registers. Alternatively, the program may not require that variables are located at a specific address, but may make assumptions about the general area of memory a variable would be found it or the alignment of its address.
The analyzer does not make assumptions about the specific addresses of variables by default. Instead, it treats each variable as if it were placed at some valid, but unknown location in memory. However, since some programs require that a set of variables have their addresses concretized or limited to specific ranges, the user can instruct the analyzer to put such additional constraints on the variables’ addresses.
When variables are constrained, the analyzer is capable of performing operations on the values of their addresses in line with these constraints, allowing it to produce more precise results for bit-wise operations, integer arithmetic, and other operations. For instance, a variable with an alignment constraint will have an address that is divisible according to that alignment.
The user can place constraints on variable addresses using the tis_address
builtin attribute or the absolute-address
configuration option. The user can
use these to attach address constraints onto individual variables. These
constraints can take the form of singleton addresses or ranges of potential
addresses, with or without additional alignment requirements. In addition, the
analyzer provides a separate address-alignment
option that sets an alignment
for all objects in memory. All of these features are described in detail below.
Example The following code illustrates one facet of the problem. The program
uses the technique of pointer tagging to smuggle data in “unused” bits of
pointer values. Here, the code relies on an assumption that all addresses are
4-byte aligned to attach a 2-bit tag to pointers. The function tag_ptr
checks whether a pointer is 4-byte aligned by checking if its last two bits are
empty. If they are, the program writes a PTR_TAG
to those bits. Otherwise,
it returns 0x0
to indicate an error. The function untag_ptr
undoes
tag_ptr
: it checks whether a pointer has a tag, and if it does, it strips it
(by use of a mask). The program calls tag_ptr
and untag_ptr
in main
on the pointer into a byte array called memory
at an offset of 32
and
inspects the results using tis_show_each
.
#include <stdint.h>
#include <tis_builtin.h>
unsigned char memory[256];
#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)
uintptr_t tag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != 0) return 0x0;
return ptr | PTR_TAG;
}
uintptr_t untag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != PTR_TAG) return 0x0;
return ptr & VAL_MASK;
}
void main(void) {
uintptr_t tagged_ptr = tag_ptr(&memory[32]);
uintptr_t untagged_ptr = untag_ptr(tagged_ptr);
tis_show_each("tagged", tagged_ptr);
tis_show_each("untagged", untagged_ptr);
}
Since the analyzer assumes the variable’s addresses to be any valid address, it
cannot categorically determine whether the address passed into
tag_ptr
would pass the alignment check or not. Since it is expected that
C/C++ programs operate independently of the values of addresses assigned to
their variables by the linker, this causes the analyzer to raise a warning,
informing that a condition depends on memory layout.
$ tis-analyzer pointer_tag.c -quiet -val -slevel 10 -print -print-filter tag_ptr
¶uintptr_t tag_ptr(uintptr_t ptr)
{
uintptr_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
__retres = (uintptr_t)0x0;
goto return_label;
}
__retres = ptr | (unsigned int)2;
return_label: return __retres;
}
However, if the addresses of variables within the program are constrained in such a way that the condition always succeeds or it always fails, the analyzer will accept it and proceed with the analysis.
tis_address
attribute¶The user can constrain a variable to a specific address or one out of a range of
possible addresses by annotating it with the tis_address
attribute. Given
a variable var
of type T
, the tis_address
attribute is specified
within the __attribute__
directive:
T var __attribute__((tis_address(…)));
The attribute can be placed on any variable (or function) definition, both local
and global. A variable can have at most one tis_address
specification.
The parameter of tis_address
specifies where the variable is pinned. It is
specified as a literal describing: a single address, a range of addresses, or a
reference to a named memory region.
Warning
The tis_address
attribute does not work within C++ code. See
Absolute addresses and C++ for workarounds
and alternatives.
A singleton address is specified as a string literal containing a single address. It means that a variable will be considered pinned at that exact address. The address can be expressed as a positive (non-zero) integer provided in hexadecimal, octal, binary, and decimal representations:
tis_address("0x4000")
tis_address("0X4000")
tis_address("0o40000")
tis_address("0O40000")
tis_address("0b100000000000000")
tis_address("0B100000000000000")
tis_address("16391")
Warning
Octal number notation
The tis_address
attribute uses a different notation than C when
expressing octal numbers. Addresses passed to absolute-valid-range
prefixed with only a leading zero are interpreted as decimal.
If the address 0x0
(in any representation) is specified as an address of a
variable, the analyzer stops the analysis with an error.
For convenience a single address can also be specified via an integer literal:
tis_address(0x4000)
tis_address(0X4000)
tis_address(040000)
tis_address(0b100000000000000)
tis_address(0B100000000000000)
tis_address(16391)
This is especially convenient for specifying a group of addresses that are relative to each other.
tis_address(0x4000 + 0)
tis_address(0x4000 + 1)
tis_address(0x4000 + 2)
tis_address(0x4000 + 3)
Example Consider again the example at the top of the
section, where
the tag_ptr
and untag_ptr
functions check the alignment of a pointer and
write or strip a tag from it (respectively). Specifically, the functions are
called on the pointer into an array called memory
.
#include <stdint.h>
#include <tis_builtin.h>
unsigned char memory[256];
#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)
uintptr_t tag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != 0) return 0x0;
return ptr | PTR_TAG;
}
uintptr_t untag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != PTR_TAG) return 0x0;
return ptr & VAL_MASK;
}
void main(void) {
uintptr_t tagged_ptr = tag_ptr(&memory[32]);
uintptr_t untagged_ptr = untag_ptr(tagged_ptr);
tis_show_each("tagged", tagged_ptr);
tis_show_each("untagged", untagged_ptr);
}
Ordinarily, the analyzer cannot decide whether the condition in tag_ptr
succeeds or not because it depends on the value of the memory address of
memory
. The analyzer considers situations where a condition depends on the
value of an address suspicious, if that condition could go either way. This is
the case here, because the address is not constrained, so it could potentially
be any valid address.
$ tis-analyzer pointer_tag.c -val -slevel 10 -quiet -print -print-filter tag_ptr
¶uintptr_t tag_ptr(uintptr_t ptr)
{
uintptr_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
__retres = (uintptr_t)0x0;
goto return_label;
}
__retres = ptr | (unsigned int)2;
return_label: return __retres;
}
However, with the use of the tis_address
attribute, the program can pin the start of
memory
to a specific address, such as 0x20
:
📎 pointer_tag_addr.c
[excerpt]
¶unsigned char memory[256] __attribute__((tis_address("0x20")));
Then, the analyzer uses that specific address value to determine what the
outcomes of both the condition in tag_ptr
and the condition in untag_ptr
can potentially be. Since 0x20 + 32 = 0x40
is 4-byte aligned (it ends with
binary 00
), both conditions pass, so the analyzer proceeds accordingly. The
analysis then prints out the value of the tagged pointer to memory[32]
as
0x42
(0x20 + 32 + 2
) and the subsequently untagged pointer as 0x40
(0x20 + 32
). The analysis is run with big-ints-hex
set to 0x1f
to
display the values of all addresses using hexadecimal representation (see
introspecting addresses).
$ tis-analyzer pointer_tag_addr.c -val -slevel 10 -big-ints-hex 0x1f
¶[value] Called tis_show_each({{ "tagged" }}, {0x42})
[value] Called tis_show_each({{ "untagged" }}, {0x40})
An address range specifies that a given variable is to be located at a single address from the range between the first and last address (inclusive) during the execution. The analyzer never disambiguates the range for the purposes of the analysis.
A range is specified by boundaries, and it is denoted by square brackets with
two singleton addresses
separated by a two-dot ellipsis. This specification is expressed by a string
literal and passed to tis_address
as a single argument:
tis_address("[FIRST..LAST]")
The following example describes ranges containing the addresses 4000
,
4001
, 4002
, 4003
, 4004
, 4005
, 4006
, and 4007
using
all available representations:
tis_address("[0x4000..0x4007]")
tis_address("[0X4000..0X4007]")
tis_address("[0o40000..0o40007]")
tis_address("[0O40000..0O40007]")
tis_address("[0b100000000000000..0b100000000000111]")
tis_address("[0B100000000000000..0B100000000000111]")
tis_address("[16391..16398]")
Warning
Octal number notation
The tis_address
attribute uses a different notation than C literals. The
notations differ when expressing octal numbers. Notation specific to
tis_address
signify octal number by the prefixing them with 0o
or
0O
, as opposed to C notation where they are signified by just a leading
zero. Addresses passed to tis_address
prefixed with only a leading zero
are interpreted as decimal.
The first address in a range cannot be larger than the last. Ranges where the
first and last addresses are the same are interpreted as singleton addresses.
Address ranges cannot include the address 0x0
. If 0x0
is included in any
of the constraints, the analyzer stops execution with an error.
An address range can also optionally include a specification of alignment. The alignment is given after a comma as congruence information (remainder and modulus):
tis_address("[FIRST..LAST],REM%MOD")
See Value analysis data representation (Integer values) for more information on congruence.
When specifying a range with an alignment, the FIRST
and LAST
addresses
must match the alignment.
The following range contains all the addresses with a alignment to a 4-byte
boundary, so including the addresses: 0x4000
, 0x4004
, 0x4008
,
0x400c
:
tis_address("[0x4000..0x400c],0%4")
As another example, the following range contains the addresses
0x4001
, 0x4005
, 0x4009
, 0x400d
:
tis_address("[0x4001..0x400d],1%4")
Note that the boundaries fit the specified alignment too.
The tis_address attribute allows setting alignment for individual variables.
To specify alignment globally, use the address-alignment
option described in
a separate section below.
Example Consider the pointer tagging example from the previous
subsection again, but instead
of assigning a single specific address, the example allows memory
to start
at any address from a range of 0x20
to 0x40
:
📎 pointer_tag_range.c
[excerpt]
¶unsigned char memory[256] __attribute__((tis_address("[0x20..0x40]")));
(Since the example uses address ranges, it does not attempt to print the exact addresses of variables, see Introspecting addresses for more information.)
However, analyzing the range yields a warning again, because the address range contains addresses that are 4-byte aligned as well as ones that are not, and so the constraint does not guarantee the condition is always resolved the same way between executions.
$ tis-analyzer pointer_tag_range.c -val -slevel 10 -quiet -print -print-filter tag_ptr
¶uintptr_t tag_ptr(uintptr_t ptr)
{
uintptr_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
__retres = (uintptr_t)0x0;
goto return_label;
}
__retres = ptr | (unsigned int)2;
return_label: return __retres;
}
Therefore, the following modification constrains the pool of available addresses
further to allow only those within the bounds between 0x20
and 0x40
that
are 4-byte aligned:
📎 pointer_tag_range_align.c
[excerpt]
¶unsigned char memory[256] __attribute__((tis_address("[0x20..0x40],0%4")));
This new constraint allows the analyzer to determine that the conditions in
tag_ptr
and untag_ptr
are always going to evaluate to truth. In effect,
the pointer will be tagged to the value &memory + {34}
and subsequently
untagged to &memory + {32}
, as expected. Since &memory
is a range of
possibilities rather than a discrete address, the values of tagged_ptr
and
untagged_ptr
are represented symbolically.
$ tis-analyzer pointer_tag_range_align.c -val -slevel 10
¶[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
Example Consider the following practical example too. By default, the
address size of a variable depends on the architecture chosen by the user
(x86_64
has 64-bit pointers, x86_32
has 32-bit pointers, etc.) However,
programmers sometimes take advantage of specific assumptions about the values of
addresses to optimize operations on pointers. So, the following code assumes
variables are allocated within the first 4GiB of available memory, and uses a
32-bit variable to store the low part of a (potentially) 64-bit pointer.
The program makes a pointer to variable var
, and uses the function
as_uint32
to chop it in half and return a 32-bit unsigned integer containing
only the low bits into. The function also checks whether the pointer actually
fits within 32-bits, and returns 0x0
if this is not the case. Once a pointer
is reduced to a 32-bit integer, the program casts the integer back into a
pointer and uses it to write 42
to var
.
#include <stdint.h>
#include <assert.h>
#include <tis_builtin.h>
#include <stdint.h>
uint32_t as_uint32(uintptr_t ptr) {
if(ptr >> 32UL == 0) {
return ptr & 0xffffffff;
} else {
return 0x0;
}
}
int main (){
unsigned char var;
uint32_t small_ptr = as_uint32(&var);
tis_show_each("small_ptr", small_ptr);
unsigned char *ptr = (unsigned char *) small_ptr;
tis_show_each("ptr", ptr);
*ptr = 42;
tis_show_each("var", var);
}
When the program is analyzed (within a 64-bit architecture), the analyzer emits
an alert, reporting that the condition in as_uint32
depends on memory
layout. Since pointers are 64-bit and addresses are unconstrained, the analyzer
cannot decide which branch the program would take during execution.
$ tis-analyzer 32bit_pointer.c -val -slevel 10 -64
¶uint32_t as_uint32(uintptr_t ptr)
{
uint32_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr >> 32UL == (uintptr_t)0) {
__retres = (uint32_t)(ptr & (unsigned long)0xffffffff);
goto return_label;
}
else {
__retres = (uint32_t)0x0;
goto return_label;
}
return_label: return __retres;
}
In order to inform the analyzer about the assumption that the address falls
within the first 4GiB of memory, the example is amended to specify that
var
’s address falls between 1
and 0xffffffff
via the tis_address
attribute.
📎 32bit_pointer_addr.c
[excerpt]
¶ tis_show_each("small_ptr", small_ptr);
Then, execution proceeds as expected, showing that the pointer can be safely
chopped and that casting the resulting integer back to a pointer produces a
valid pointer to var
:
$ tis-analyzer 32bit_pointer.c -val -slevel 10 -64 -quiet -print -print-filter as_uint32
¶[value] Called tis_show_each({{ "small_ptr" }}, {{ &var }})
[value] Called tis_show_each({{ "ptr" }}, {{ &var }})
[value] Called tis_show_each({{ "var" }}, {42})
An address range can also be specified by a reference to an externally defined
named memory region. This subsection describes how to define memory regions and
then shows how to use them with tis_address
below.
Named memory regions are defined via the memory-region
option of the
analyzer. The user can invoke this option via the command-line option
-memory-region or via the equivalent JSON
option.
Using either method, the user defines a list of memory regions. Each region definition consists of a label and a range of addresses the region contains. The label is a string akin to a variable name; it must start with a letter followed by any number of letters, numbers or underscores.
A range of addresses in a memory region is described either as:
FIRST[LENGTH]
), or[FIRST..LAST]
).The addresses and lengths describing a range are expressed using hexadecimal, octal, binary, or decimal integers, just like singleton addresses.
Warning
Memory regions do not allow alignment specification.
Memory regions cannot include the address 0x0
. If 0x0
is included in any
of the constraints, the analyzer stops execution with an error.
When defining memory regions via the -memory-region command-line option, memory regions form a comma-separated list, with each memory region’s label and address range separated delimited by a colon.
For example, the following command-line option defines two address ranges named
R1
and R2
, where:
R1
contains the four addresses between 0x4000
and 0x4003
(inclusive),R2
contains twelve addresses starting at 0x4004
(up to and
including 0x400f
).$ tis-analyzer -memory-region 'R1:[0x4000..0x4003],R2:0x4004[12]' …
Tip
Avoiding shell expansion
When defining memory regions through a command-line argument, quote the
definition to prevent the shell from expanding any of the symbols within.
(Use single quotes '…'
in most shells.)
When defining memory regions via the memory-region
JSON configuration
option, the option accepts a map from the regions’ labels
to their address ranges. Both the labels and the definitions of address ranges
are strings themselves.
For example, the following configuration defines the same two address ranges as
above, named R1
and R2
:
{
"memory-region": {
"R1": "[0x4000..0x4003]",
"R2": "0x4004[12]"
}
}
tis_address
¶The user can refer to a named memory region by its label when defining a range
of addresses with the tis_address
attribute. The attribute’s address range
will then be defined in terms of the memory region represented by the label,
which can be further modified with additional alignment information.
For example, if the analyzer’s configuration declares the regions R1
and
R2
from the previous section, they can be used in attribute declarations
simply as follows:
tis_address("R1")
tis_address("R2")
While memory regions do not specify their own alignments, the user can specify
an alignment within tis_address
. It is appended in the same way as it is to
a memory range.
For instance, the following tis_address
declarations would use all addresses
within R1
but only addresses aligned to a 4-byte boundary for R2.
tis_address("R1")
tis_address("R2,0%4")
When applying an alignment to a region, the first and last addresses within the region must match the specified alignment.
Tip
Setting global alignment
Named regions always have precise boundaries, so they are not well suited for expressing global alignment constraints.
To specify alignment without also specifying the bounds of the range use the
address-alignment
option described in a section
below.
Example Consider again the pointer tagging example from
above. Here, the program again
example allows memory
to start at any address from a range of 0x20
to
0x40
, but this is not expressed in the code directly. Instead, the code
refers to a named region called MEM
, which will be defined through the
configuration of the analysis.
📎 pointer_tag_region.c
[excerpt]
¶unsigned char memory[256] __attribute__((tis_address("MEM")));
The code is then analyzed with memory-region
defining a named region
starting at 0x20
and spanning 65 addresses (so ending at 0x40
). Since,
the region is not constrained to a particular alignment, the effective address
range contains addresses that both fail and pass the conditions in tag_ptr
and untag_ptr
, so this results in the warning about conditions dependent on
memory layouts again:
$ tis-analyzer pointer_tag_region.c -val -slevel 10 -memory-region 'MEM:0x20[65]' -print -print-filter tag_ptr
¶uintptr_t tag_ptr(uintptr_t ptr)
{
uintptr_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
__retres = (uintptr_t)0x0;
goto return_label;
}
__retres = ptr | (unsigned int)2;
return_label: return __retres;
}
When the code is modified to constrain the region to a 4-byte alignment, the
code executes without warnings and produces the expected values of &memory +
{34}
for the tagged pointer and &memory + {32}
for the untagged one:
📎 pointer_tag_region_align.c
[excerpt]
¶unsigned char memory[256] __attribute__((tis_address("MEM,0%4")));
$ tis-analyzer pointer_tag_region_align.c -val -slevel 10 -memory-region 'MEM:0x20[65]'
¶[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
The analyzer assumes that constraints placed on the addresses of variables represent some real constraint set expressed in a linker script or elsewhere in the toolchain used for compiling a given code base. The analyzer also requires that the analyzed code base compiles correctly under these constraints. It is up to the user to ensure that these prerequisites are met.
C and C++ variables are not allowed to overlap in memory. Since that is the case, a C or C++ compilation toolchain will refuse to compile a code base with a set of constraints that would require for memory objects to overlap. Therefore, the analyzer trusts that the specified set of address constraints can be satisfied by some valid memory layout. Specifically, a valid layout must ensure that:
Since an invalid constraint set would be encoded within a toolchain that would then fail to compile the code base, the user should be aware of any problems ahead of time. Then, the analyzer does not check the validity of the constraint set. This means that in the general case, when provided an invalid constraint set, the analyzer produces results rather than stopping with an error. Since these results do not reflect of any possible real-world execution, they are useless.
Warning
The analyzer does not check the validity of the constraint set placed on variable addresses in the general case. If the constraints make it impossible to produce a valid memory layout, the analyzer may produce results that do not reflect any real-world execution.
While the analyzer does not detect invalid constraint sets in general, it provides courtesy errors in specific circumstances. The possible behaviors of the analyzer in the presence of invalid constraint sets are enumerated in the following table:
\(\exists\) valid layout | constraint set | \(\Rightarrow\) analysis result |
---|---|---|
yes | singleton address overlaps address range | correct result |
yes | address range overlaps address range | correct result |
no | singleton address overlaps singleton address | error |
no | singleton address overlaps address range | result does not reflect any execution |
no | singleton address overlaps absolute valid range | error |
no | address range overlaps another address range | result does not reflect any execution |
no | address range overlaps absolute valid range | result does not reflect any execution |
Example The following program declares two variables, x
and y
, each
a 4-byte array of bytes. Here, x
and y
are both constrained to a single
address in memory, 0x4004
via the tis_address
attribute.
#include <tis_builtin.h>
unsigned char x[4] __attribute__((tis_address(0x4004)));
unsigned char y[4] __attribute__((tis_address(0x4004)));
void main() {
// ...
}
Analyzing this program yields an error, informing that the constraint placed on
variable y
causes the constraint set to be invalid. It is immaterial that
neither variable is accessed.
$ tis-analyzer nonunique_address.c -val
¶[kernel] user error: invalid address specification for variable 'y'
(cannot register variable y in the memory range [0x4004 .. 0x4007] because memory range [0x4004 .. 0x4007] already holds variable x. Memory zone cannot overlap.).
[kernel] TrustInSoft Kernel aborted: invalid user input.
Example The following program extends the one above. Here variable y
is
constrained to a whole range, rather than a single address. Nevertheless, the
range encompasses address 0x4004
, to which x
is specifically
constrained. The constraints are invalid, because there is no layout where x
and y
would be placed in memory without overlapping, and variables cannot
overlap.
#include <tis_builtin.h>
unsigned char x[4] __attribute__((tis_address("0x4004")));
unsigned char y[4] __attribute__((tis_address("[0x4004..0x4005]")));
void main() {
x[0] = 255;
x[1] = 255;
x[2] = 255;
x[3] = 255;
tis_show_each("*x", x[0], x[1], x[2], x[3]);
tis_show_each("*y", y[0], y[1], y[2], y[3]);
}
If these constraints were reflected in a linker script for this program, it would not compile, so it should not be analyzed at all. But if it were analyzed anyway, the analyzer does not detect that the constraint set is invalid, and the analysis completes without errors. The results this produces do not reflect any possible execution of the program.
$ tis-analyzer overlapping_range_bad.c -val
¶[value] Called tis_show_each({{ "*x" }}, {255}, {255}, {255}, {255})
[value] Called tis_show_each({{ "*y" }}, {0}, {0}, {0}, {0})
absolute-address
option¶The absolute-address
option allows the user to constrain the addresses of
global variables (external linkage symbols) in the same way as tis_address, but to do so without
modifying the code of the analyzed program. Instead, the constraints can be
specified via a command-line option or JSON configuration.
The absolute-address command-line option constraints a list of variables to a addresses or address ranges. The list of variables and their constraints is provided as a comma-separated list, with each element containing a name of a global variable and an associated constraint in the form of a singleton address, a range of addresses, or a reference to a memory region, as described above.
For instance, the following command-line option:
x
(in some program) to the address 0x4000
,y
to an address out of the an address range containing
0x4001
, 0x4002
, 0x4003
, andz
to an address out of the an address range containing all
4-byte aligned addresses between 0x4004
and 0x4010
.$ tis-analyzer -absolute-address 'x:0x4000,y:[0x4001..0x4003],z:[0x4004..0x4010]\,0%4' …
Warning
Note the backslash!
Constraints are specified on the command-line as a comma-separated list which means comma appearing in the alignment definition added onto address ranges and references to named regions must be escaped with a backslash.
For convince, the absolute-address
command-line argument can be used
multiple times in a single invocation of tis-analyzer
. In that case, the
analyzer uses a union of the constraints defined by absolute-address
options. E.g.:
$ tis-analyzer -absolute-address 'x:0x4000' \
-absolute-address 'y:[0x4001..0x4003]' \
-absolute-address 'z:[0x4004..0x4010]\,0%4' \
…
Just like in the case of the tis_address
attribute, the
absolute-address
option can also use named memory regions to define
addresses and ranges. The following command-line option pins variables r1
and r2
to addresses within the region R
, which is specified as the
twelve consecutive addresses starting from 0x4004
. Variable r2
is
further constrained to only those addresses that are aligned to a 4-byte
boundary.
$ tis-analyzer -memory-region 'R:0x4004[12]' -absolute-address 'r1:R,r2:R\,0%4' …
Tip
Avoiding shell expansion
When defining address constraints through a command-line argument, it is recommended to wrap the definition in appropriate quotes to prevent the shell from expanding any of the symbols that may have special meaning.
The absolute-address
JSON configuration option
works analogously to the command-line option. The option accepts a map, whose
keys are variable names, and the values describe constraints that should be
applied to the addresses of those variables. The constraints are strings
described in terms of of singleton addresses, ranges of addresses, or references to memory
regions, also as above.
As an example, the following snippet constrains variables x
, y
and z
in the same way as the command-line example above, but using JSON
configuration:
{
"absolute-address": {
"x": "0x4000",
"y": "[0x4001..0x4003]",
"z": "[0x4004..0x4010],0%4"
}
}
As a further example, the following constrains variables r1
and r2
again, using references to the named memory region R
:
{
"absolute-address": {
"r1": "R",
"r2": "R,0%4"
},
"memory-region": {
"R": "0x4004[12]"
}
}
A variable should only by constrained once. If a variable has multiple
constraints defined via absolute-address
, the analyzer issues a warning, but
continues the analysis, using the rightmost constraint.
The absolute-address
option can also be used in conjunction with the
tis_address
attribute to assign constraints to two disjoint sets of
variables. If a variable is constrained both through tis_address
and through
absolute-address
, the analyzer raises an invalid user input error and stops.
Warning
Invalid constraint set
The analyzer does not check the validity of the constraint set placed on variable addresses in the general case. If the constraints make it impossible to produce a valid memory layout, the analyzer may produce results that do not reflect any real-world execution.
Example Consider the example from the beginning of the section on constraining physical addresses. This code uses pointer tagging to embed two bits of information in pointers based on the assumption that all addresses are 4-byte aligned.
#include <stdint.h>
#include <tis_builtin.h>
unsigned char memory[256];
#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)
uintptr_t tag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != 0) return 0x0;
return ptr | PTR_TAG;
}
uintptr_t untag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != PTR_TAG) return 0x0;
return ptr & VAL_MASK;
}
void main(void) {
uintptr_t tagged_ptr = tag_ptr(&memory[32]);
uintptr_t untagged_ptr = untag_ptr(tagged_ptr);
tis_show_each("tagged", tagged_ptr);
tis_show_each("untagged", untagged_ptr);
}
To configure the analyzer so that all the (relevant) addresses are 4-byte
aligned, the absolute-address
option can put constraints on the value of the
address of memory
. Specifically, the following command-line configuration
declares that memory
is going to be located at address 0x20
.
The analysis is run with big-ints-hex
set to format all addresses as
hexadecimal, so the resulting execution proceeds to prints out the value of the
tagged pointer to memory[32]
as 0x40
(0x20 + 32 + 0b10
) and the
subsequently untagged pointer as 0x42
(0x20 + 32
).
$ tis-analyzer pointer_tag.c -val -slevel 10 -big-ints-hex 0x1f -absolute-address 'memory:0x20'
¶[value] Called tis_show_each({{ "tagged" }}, {0x42})
[value] Called tis_show_each({{ "untagged" }}, {0x40})
Similarly, memory
can be constrained to an overapproximated set of possible
4-byte aligned addresses. Note the use of a backslash-escaped comma \,
to
delimit the congruence information from the specification of the boundaries. It
is used here, because variable constraints are also comma-separated.
$ tis-analyzer pointer_tag.c -val -slevel 10 -absolute-address 'memory:[0x20..0x40]\,0%4'
¶[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
Finally, memory
can be constrained to a specific memory region defined as
MEM
, with an additional congruence constraint that aligns the addresses to
the appropriate boundary:
$ tis-analyzer pointer_tag.c -val -slevel 10 -absolute-address 'memory:MEM\,0%4' -memory-region 'MEM:0x20[65]'
¶[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
In order to specify an address or range of addresses for specific variables, the
user can use the absolute-address
configuration option described in
above. The user
can also use the address-alignment
option described
below to provide a blanket alignment
specification to all variables. The user can also redefine the problem in terms
of absolute addresses and the
absolute-valid-range
option.
However, the tis_address
attribute is not supported in C++ source files.
When the attribute is used in C++ code, the analyzer issues a warning and
proceeds to analyze the code in question as if the attribute were absent. In
order to specifically use tis_address
within the code of the analyzed
program, the user can modify the program in question so that the attribute is
attached to a corresponding variable declared in a separate C file.
Warning
TrustInSoft Analyzer ignores the tis_address
attribute in C++ source
files.
Example Consider the following excerpt from a real-world C++ application.
Here, MMIO
is a class that acts as a wrapper which provides the basic
functionality of a peripheral hardware register around an absolute memory
location. MMIO
contains the static member pointer
that defines the
location in memory where the register begins along with a type Contents
which determines how many bytes the register spans. Both of the offset in memory
and size of the register are provided via template arguments, and pointer
is
initialized statically by casting the provided address onto a pointer to
Contents
. The class also provides a method called byte
which returns a
reference to a byte within pointer
. An access to this reference means that
data is read from the peripheral, with each access causing a new read.
The example then, declares two 4-byte ports: port_a
and port_b
located
at 0x8000
and 0x8008
respectively. The main
function simply displays
out the the addresses of the first byte of each port via the tis_show_each
built-in, taking care to coerce the addresses into numerical values by
performing a bit-wise operation on them (see Introspecting addresses).
#include <cstdint>
#include <functional>
#include <tis_builtin.h>
template <std::uintptr_t address, std::size_t size>
struct MMIO {
using Contents = volatile std::uint8_t[size];
static Contents * const pointer;
template <std::size_t offset>
static const volatile uint8_t& byte() {
return *(reinterpret_cast<const volatile uint8_t*>(&pointer[offset]));
}
};
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
using PortA = MMIO<0x8000, 4>;
using PortB = MMIO<0x8008, 4>;
PortA port_a;
PortB port_b;
int main () {
tis_show_each("port_a", &port_a.byte<0>(), tis_force_ival_representation((uintptr_t) &port_a.byte<0>()));
tis_show_each("port_b", &port_b.byte<0>(), tis_force_ival_representation((uintptr_t) &port_b.byte<0>()));
}
Analyzing this example (with tis-analyzer++
) yields an UB, since the program
attempts to access invalid memory at an invalid address:
$ tis-analyzer++ mmio.cpp -val -big-ints-hex 0x7fff
¶tests/tis-user-guide/mmio.cpp:12:[kernel] warning: pointer arithmetic:
assert \inside_object_or_null((void *)MMIO<32768, 4>::pointer);
The analysis can be made to proceed using any of the tools described in the
chapter, including using the tis_address attribute, the
absolute-address option, and the
absolute-valid-range option.
This example assumes that the user decided to use the tis_address
attribute
to specify the addresses for both port_a
and port_b
first. The following
two examples then show how to convert the solution to the other two approaches.
Following the procedure outlined in Section
Interpreting absolute addresses as external variables, the code of the example
is first modified to provide a variable equivalent to the user’s interpretation
of the contents of memory at the address specified by pointer
(for every
template specialization of the class MMIO
). This involves creating a templated volatile
variable,
here called port_contents
, for each specialization of MMIO
and having
the associated pointer
point to that variable instead of being defined by an
arbitrary address.
📎 mmio_var.c
[excerpt]
¶#ifdef __TRUSTINSOFT_ANALYZER__
template<std::uintptr_t address, std::size_t size>
volatile uint8_t port_contents[size];
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * MMIO<address, size>::pointer =
&port_contents<address, size>;
#else
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents& MMIO<address, size>::pointer =
reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#endif
Then, port_contents
is assigned the specific address
defined through the template argument address
and associated with the
variable via tis_address
. The modifications specific to TrustInSoft Analyzer
are locked away behind preprocessor conditionals, to prevent them from impacting
actual execution.
📎 mmio_addr.c
[excerpt]
¶ volatile uint8_t port_contents[size] __attribute__((tis_address(address)));
The analysis informs that an unknown attribute was encountered (among other
warnings). It then executes without emitting an alarm, because the addresses
accessed are not longer invalid, but refer to specific variables. However, since
tis_address
was not recognized, the specific addresses remain unconstrained,
causing tis_show_each
to display the underlying values of
&port_contents<32768, 4>
and &port_contents<32776, 4>
(equivalent to
&port_contents<0x8000, 4>
and &port_contents<0x8008, 4>
) as any possible
value that can be represented by the pointer [1..0xFFFFFFFB]
, instead of the
expected addresses 0x8000
and 0x8008
.
$ tis-analyzer++ mmio_addr.cpp -val -big-ints-hex 0x7fff
¶tests/tis-user-guide/mmio_addr.cpp:18:[cxx] warning: variable templates are a C++14 extension
tests/tis-user-guide/mmio_addr.cpp:18:[cxx] warning: unknown attribute 'tis_address' ignored
tests/tis-user-guide/mmio_addr.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8000, 4> ignored
tests/tis-user-guide/mmio_addr.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8008, 4> ignored
[value] Called tis_show_each({{ "port_a" }},
{{ &port_contents<0x8000, 4> }},
[1..0xFFFFFFFB])
[value] Called tis_show_each({{ "port_b" }},
{{ &port_contents<0x8008, 4> }},
[1..0xFFFFFFFB])
In order to pin port_a
and port_b
to specific addresses, the
tis_address
attribute must be attached to variables declared in C and not
C++.
In order to accomplish this, the code is further modified so that
port_contents
variables for each specialization are declared as extern
"C"
. Since templated variables cannot be declared within extern "C"
contexts, the code declares separate variables for specific specializations,
called port_a_contents
and port_b_contents
.
📎 mmio_c_addr.c
[excerpt]
¶#ifndef __TRUSTINSOFT_ANALYZER__
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#else
extern "C" uint8_t port_a_contents[4];
extern "C" uint8_t port_b_contents[4];
template<>
typename MMIO<0x8000, 4>::Contents * const MMIO<0x8000, 4>::pointer =
&port_a_contents;
template<>
typename MMIO<0x8008, 4>::Contents * const MMIO<0x8008, 4>::pointer =
&port_b_contents;
#endif
Then, the example is extended with another file. This file consists of C code
that redeclares port_a_contents
and port_b_contents
with the
tis_address
attribute:
#include <stdint.h>
#include <tis_builtin.h>
extern uint8_t port_a_contents[4] __attribute__((tis_address(0x8000)));
extern uint8_t port_b_contents[4] __attribute__((tis_address(0x8008)));
At this junction the analysis can be executed (with both files added to the analysis configuration). This time the variables are pinned to the appropriate addresses, as desired:
$ tis-analyzer++ mmio_c_addr.cpp mmio_ports.c -val -big-ints-hex 0x7fff
¶[value] Called tis_show_each({{ "port_a" }}, {{ &port_a_contents }}, {0x8000})
[value] Called tis_show_each({{ "port_b" }}, {{ &port_b_contents }}, {0x8008})
Example Instead of attaching the tis_address
attribute to variables via
a separate C file, variables can instead be pinned to specific addresses within
tis-analyzer++
via the
absolute-address
configuration option. To do this, consider the example again with a templated
variable port_contents
introduced as an equivalent to absolute memory
addresses, but without an tis_address
attribute attached to it in any way:
📎 mmio_var.c
[excerpt]
¶#ifdef __TRUSTINSOFT_ANALYZER__
template<std::uintptr_t address, std::size_t size>
volatile uint8_t port_contents[size];
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * MMIO<address, size>::pointer =
&port_contents<address, size>;
#else
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents& MMIO<address, size>::pointer =
reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#endif
This variable can be constrained to a specific address (or range) via the absolute-address. The difficulty lies in figuring out the mangled name of the variable and accounting for the template. This can be done by finding the mangled variable names in the list of all variables:
$ tis-analyzer++ mmio_var.cpp -info-csv-variables vars.csv; head -1 vars.csv; grep port_contents <vars.csv
¶Name(s), File, Line, Type, Function, Kind, Storage, Initialized, Volatile, Const, Temporary, Is libc
_Z13port_contentsILj32768ELj4EE, tests/tis-user-guide/mmio_var.cpp, 22, <array>, NA, global variable, defined, yes, yes, no, no, libc:no
_Z13port_contentsILj32776ELj4EE, tests/tis-user-guide/mmio_var.cpp, 22, <array>, NA, global variable, defined, yes, yes, no, no, libc:no
Alternatively, the mangled name can also be discovered via the
tis-analyzer++
GUI by right-clicking the variable definition and selection
“Copy mangled name” from the drop down menu:
The mangled variable names can then be plugged directly into
absolute-address
with address constraints. The analysis should be run with
the C++14 (or more recent) standard specified, since it contains templated
variables that were introduced in that version of C++. Executing the analysis
with this option leads to the values of the first bytes of port_a
and
port_b
to be rendered according to expectations as 0x8000
and 0x8008
,
respectively.
$ tis-analyzer++ -val mmio_var.cpp -big-ints-hex 0x7fff -cxx-std=c++14 -absolute-address _Z13port_contentsILj32768ELj4EE:0x8000,_Z13port_contentsILj32776ELj4EE:0x8008
¶tests/tis-user-guide/mmio_var.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8000, 4> ignored
tests/tis-user-guide/mmio_var.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8008, 4> ignored
[value] Called tis_show_each({{ "port_a" }},
{{ &port_contents<0x8000, 4> }},
{0x8000})
[value] Called tis_show_each({{ "port_b" }},
{{ &port_contents<0x8008, 4> }},
{0x8008})
(The analyzer also emits a warning that a volatile variable is initialized. This is due to it being declared as global, since global variables are always initialized. The warning can be safely ignored.)
Example The original example from the start of the session can also be made to work without any code modification, by specifying the area of memory accessed via concrete absolute addresses as valid using the valid-absolute-range configuration option instead. Thus, consider again the original program:
#include <cstdint>
#include <functional>
#include <tis_builtin.h>
template <std::uintptr_t address, std::size_t size>
struct MMIO {
using Contents = volatile std::uint8_t[size];
static Contents * const pointer;
template <std::size_t offset>
static const volatile uint8_t& byte() {
return *(reinterpret_cast<const volatile uint8_t*>(&pointer[offset]));
}
};
template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
using PortA = MMIO<0x8000, 4>;
using PortB = MMIO<0x8008, 4>;
PortA port_a;
PortB port_b;
int main () {
tis_show_each("port_a", &port_a.byte<0>(), tis_force_ival_representation((uintptr_t) &port_a.byte<0>()));
tis_show_each("port_b", &port_b.byte<0>(), tis_force_ival_representation((uintptr_t) &port_b.byte<0>()));
}
Given that the program accesses addresses spanning from 0x8000
to 0x8008 +
4
, the analysis can be conducted by setting that range as valid. This causes
the analysis to produce the expected results.
$ tis-analyzer++ -val mmio.cpp -big-ints-hex 0x7fff -absolute-valid-range 0x8000-0x800b
¶[value] Called tis_show_each({{ "port_a" }}, {0x8000}, {0x8000})
[value] Called tis_show_each({{ "port_b" }}, {0x8008}, {0x8008})
While the tis_address attribute and the absolute-address option both allow setting alignment per variable, using either approach would be tedious if the analyzed code requires that all variables conform to a specific alignment. If that is the case, the user can configure the analyzer to assume a given alignment for all variables.
The user can specify the alignment for all variables either by setting the
-address-alignment command-line option or its
equivalent JSON configuration option. The command-line
option accepts an integer specifying the alignment in bytes. For example, this
sets the alignment of all variables to 4
bytes:
$ tis-analyzer -address-alignment 4 …
The JSON configuration file works analogously:
{
"address-alignment": 4
}
Example Reprise the pointer tagging example
program from
the
introduction.
It relies on an assumption that all
addresses are 4-byte aligned to smuggle 2 bits of information within pointers.
Specifically, the function tag_ptr
checks whether a pointer is 4-byte
aligned by checking if their last two bits are empty. If it is, the program
writes a tag to those bits. Otherwise, it returns 0x0
to indicate an error.
The program calls tag_ptr
in main
on the pointer to a variable called
memory
and inspects the result using tis_show_each
.
#include <stdint.h>
#include <tis_builtin.h>
unsigned char memory[256];
#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)
uintptr_t tag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != 0) return 0x0;
return ptr | PTR_TAG;
}
uintptr_t untag_ptr(uintptr_t ptr) {
if (ptr & TAG_MASK != PTR_TAG) return 0x0;
return ptr & VAL_MASK;
}
void main(void) {
uintptr_t tagged_ptr = tag_ptr(&memory[32]);
uintptr_t untagged_ptr = untag_ptr(tagged_ptr);
tis_show_each("tagged", tagged_ptr);
tis_show_each("untagged", untagged_ptr);
}
By default, the analyzer does not assume an alignment beyond the one implied by
the type of each variable, so it cannot categorically determine address whether
the address passed into tag_ptr
would pass the alignment check or not. Thus,
it raises an alarm:
$ tis-analyzer pointer_tag.c -val -slevel 10 -print -print-filter tag_ptr
¶uintptr_t tag_ptr(uintptr_t ptr)
{
uintptr_t __retres;
/*@ assert
Value: unclassified:
\warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
*/
if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
__retres = (uintptr_t)0x0;
goto return_label;
}
__retres = ptr | (unsigned int)2;
return_label: return __retres;
}
In order to proceed with the analysis, the user sets the address-alignment
option to 4
, in which case, the analyzer can calculate whether the last two
bits of the pointer will be empty or not, and proceed. The analysis finishes and
shows the contents of the tagged pointer.
$ tis-analyzer pointer_tag.c -val -slevel 10 -address-alignment 4
¶[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
Since the tag added to the pointer was 2
the resulting tagged pointer is
shown as the original address &memory + 32
becomes &memory + 34
.
To analyze an application
that uses dynamic loading features from <dlfcn.h>
such as dlopen
, dlsym
, etc.,
some specific information has to be provided by the user.
For instance, let’s consider the following program to be analyzed:
use-dlopen.c
to analyze:¶#include <stdio.h>
#include <dlfcn.h>
int main (void) {
void * handle = dlopen("some_lib.so", RTLD_LAZY);
if ( ! handle) {
fprintf(stderr, "%s\n", dlerror());
return 1;
}
void (*f_process) (void);
f_process = (void (*)(void)) dlsym(handle, "process");
char * error = dlerror();
if (error != NULL) {
fprintf(stderr, "%s\n", error);
return 1;
}
f_process();
dlclose(handle);
return 0;
}
Since it is deterministic, the interpreter mode is used, but it would be similar for a larger analysis that uses the analyzer mode. So, it can be analyzed with the following command:
$ tis-analyzer --interpreter use-dlopen.c
The trace shows that, in order to be able to load the function with dlsym
,
a stub_dlsym
function has to be provided:
[TIS LIBC STUBS]: stub_dlsym error: For a more accurate analysis, override this function "stub_dlsym" with your own function
dlsym error: unable to load process symbol
Such a stub may for instance look like:
use-dlopen-stubs.c
to provide stub_dlsym
:¶#include <string.h>
#include <dlfcn.h>
void process (void);
void *stub_dlsym(const char *filename, const char * fname) {
void * pf = NULL;
if (0 == strcmp (filename, "some_lib.so")) {
if (0 == strcmp (fname, "process")) {
pf = &process;
}
}
return pf;
}
Now, the command becomes:
$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c
There is a warning about the process
function:
stub_dlsym
is provided, but not the library:¶tests/val_examples/use-dlopen-stubs.c:8:[kernel] warning: Neither code nor specification for function process, generating default assigns from the prototype
Of course, this is because the source code of the loaded library,
that is supposed to hold the process
function,
has not been provided to the analyzer.
Let’s add a use-dlopen-plugin.c
dummy file holding a process
function:
use-dlopen-plugin.c
to provide the process
function:¶#include <stdio.h>
void process (void) {
printf ("Hello from the 'process' function.");
}
Warning
Limitation: since the source files of the loaded library are analyzed
together with the main source files,
the constructor functions of dynamic libraries
are called during the main program startup
whereas they should be called when dlopen
is called.
Now, the command becomes:
$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c use-dlopen-plugin.c
The application is now analyzed as expected:
Hello from the 'process' function.
Warning
If some function names are used in both the application and the loaded library, some renaming may be needed.
Some compilers have extensions with non standard keywords. One easy way to remove these keywords from the source code is to define them as empty macros. For instance:
-cpp-extra-args="-Dinterrupt=''"
Beware that it is up to you to check if removing these keywords might change the verification relevance.
Caution
The Mthread
plug-in is only available in the commercial version
of TrustInSoft Analyzer.
The Mthread
plug-in makes it possible to verify multi-thread programs.
Because it also uses the same value analysis, it provides the same alarm detection, but it takes into account all the possible concurrent behaviors of the program by analyzing all the possible interleavings between all threads. As before, this represent an over-approximation of the possible behaviors of the program.
Moreover, Mthread
can provide an over-approximation of the memory zones
that are accessed concurrently by more than one thread. For each zone
and thread, Mthread
also returns the program points at which the zone is
accessed, whether the zone is read or is written, and the callstack that
lead to the statement.
Using the plug-in requires to add a stubbed version of the used
concurrency library to the analyzed source files. For some concurrency
libraries, this file is provided with the tool (currently pthread
,
VxWorks
, and Win32
).
Please ask for more information if needed.
Caution
The Strict Aliasing plug-in is only available in the commercial version of TrustInSoft Analyzer.
The Strict Aliasing plug-in detects the violation of the strict aliasing rule as defined in the C99 and the C11 standards.
The references taken from the C11 standard for the strict aliasing rule are:
The strict aliasing analysis is currently in beta.
The strict aliasing analysis is available using the parameter -sa
when
starting an analysis with TrustInSoft Analyzer. Using this option automatically
launches the value analysis.
If a violation of the strict aliasing rule is detected during an analysis, a warning is displayed. However, this warning does not stop the analysis.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 | int foo(int *p, float *q)
{
*p = 42;
*q = 1.337;
return *p;
}
int main(void)
{
int x;
return foo(&x, (float *)&x);
}
|
Given the previous C file foo.c, the strict aliasing analysis can be launched using the following command:
$ tis-analyzer -sa foo.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
foo.c:4:[sa] warning: The pointer q has type float *. It violates strict aliasing rules by
accessing a cell with effective type int.
Callstack: foo :: t.c:11 <- main
[value] done for function main
A violation of the strict aliasing rule is detected by the analyzer. The analyzer provides details about the violation: the pointer has the float type and the cell has the int type. However, the types are incompatible.
Several options exist to parametrize the strict aliasing analysis.
-sa-strict-enum
¶Default Value: | not set by default |
---|---|
Opposite: | -sa-no-strict-enum |
By default, the strict aliasing analysis uses the integer representation of
the enum type. This enables to use a pointer type to the integer
representation and to use the pointer to access the enum cell. The
-sa-strict-enum
option limits the default behavior: it only enables an
access to an enum cell by using a pointer to the same enum type. For
example:
1 2 3 4 5 6 7 8 9 | enum E { a, b };
int main(void)
{
enum E e = a, *p = &e;
*(int *)p = 42;
*p = b;
return e;
}
|
The access at line 6 is accepted by default by the strict aliasing analysis because the example uses a pointer to a correct integer representation, as shown by the following output:
$ tis-analyzer -sa enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
When using the -sa-strict-enum
option, the strict aliasing analysis detects a
violation at line 6, because it does not accept the integer representation.
$ tis-analyzer -sa -sa-strict-enum enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
enum.c:6:[sa] warning: The pointer (int *)p has type int *. It violates strict aliasing rules by
accessing a cell with effective type enum E.
Callstack: main
[value] done for function main
-sa-strict-struct
¶Default Value: | not set by default |
---|---|
Opposite: | -sa-no-strict-struct |
When taking the address of a structure member, the strict aliasing analysis keeps track of the structure and the member, in order to check future pointer use. By default, the analyzer enables to access a memory location with an effective type of structure member by a pointer having the same type as the member. For example:
1 2 3 4 5 6 7 8 9 | struct s { int a; };
int main(void)
{
struct s s = { 0 };
int *p = &s.a;
*p = 42;
return s.a;
}
|
The access at line 7 is enabled by the analyzer because the pointer p has the same type as the member a of the structure s.
$ tis-analyzer -sa struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
When using the -sa-strict-struct
option, this access is signaled as
non-conformant because the member must be accessed with the same effective type
(i.e. accessed by a pointer to the whole structure only).
$ tis-analyzer -sa -sa-strict-struct struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
struct.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
a cell with effective type (struct s).a[int].
Callstack: main
[value] done for function main
-sa-strict-union
¶Default Value: | set by default |
---|---|
Opposite: | sa-no-strict-union |
When taking the address of a union member, the strict aliasing analysis keeps information about the whole union, and not only the referenced member. The analyzer limits the access to a memory location that have a union type by a pointer to the same union type.
1 2 3 4 5 6 7 8 9 | union u { int a; };
int main(void)
{
union u u = { 0 };
int *p = &u.a;
*p = 42;
return u.a;
}
|
The access at line 7 is not valid according to the analyzer because the pointer p has not the expected union u type, even though the union includes a member having the same type as pointed by p.
$ tis-analyzer -sa union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
union.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
a cell with effective type (union u)[int].
Callstack: main
[value] done for function main
When using the opposite option -sa-no-strict-union
, the access is enabled,
because the union u includes a member of int type.
$ tis-analyzer -sa -sa-no-strict-union union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
In C, variables can be declared with a volatile
qualifier, which tells the
compiler that the value of the variable may change without any action being
taken by the code the compiler finds nearby. In particular, a volatile variable
can be modified outside the program. This means that its value may change
unpredictably, even if the program did not directly perform a write to that
variable. Or even if it did.
The volatile
keyword is commonly used for preserving values of variables
across a longjmp
and for memory-mapped I/O devices. The volatile
keyword also has uses in concurrency on single-core systems, such as for
variables that are updated by out-of-scope interrupt routines or for
concurrently-modified variables (a de facto practice until the ISO/IEC 9899:1999 standard, where more
robust mechanism were not yet introduced).
The unpredictable nature of volatile variables requires special consideration from the analyzer, which must over-approximate their values. This guide shows how to identify volatile variables in the analyzed program, how to perform sound analyses with volatile variables present, and how to tailor the semantics of accesses to specific volatile variables to fit specific use cases.
Given that volatile variables exhibit behavior that is distinct from how other variables behave, the analyzer keeps track of them and informs about their presence.
If a variable is volatile, the user will also be informed about this via the
GUI. Whenever a volatile variable is inspected, the symbol
appears in the Flags column of the Values tab in the bottom panel. Hovering
over the icon informs that the contents of such variables are over-approximated.
Information about all volatile variables appearing in the code can be retrieved
via the info-variables
or info-csv-variables
, which print information about
all variables in the program, either to screen, or to a file, and distinguish
volatile variables.
Variable information can be extracted via the command-line options -info-variables or -info-csv-variables.
The same result can be obtained through a JSON configuration file, by setting the option info-variables to true, or by providing a path via the info-csv-variables option (see Configuration files for more).
{
"info-variables": true,
"info-csv-variables": "…"
}
Example For instance, the following function operates on both volatile and non-volatile variables:
#include <tis_builtin.h>
volatile unsigned char port_out;
const volatile unsigned char port_in;
int main(void) {
unsigned char data[] = { 0, 1, 2, 3 };
int cursor = 0;
while (!port_in) {
port_out = data[cursor++];
}
tis_show_each("cursor", cursor);
tis_show_each("port_in port_out", port_in, port_out);
}
Information about these variables can be extracted by the command below. The
Volatile column specifies whether the variable was declared as volatile or
not. Here, port_in
and port_out
are listed as volatile.
$ tis-analyzer -info-variables volatile_example.c
¶Name(s), File, Line, Type, Function, Kind, Storage, Initialized, Volatile, Const, Temporary, Is libc
port_out, tests/tis-user-guide/volatile_example.c, 3, unsigned char, NA, global variable, defined, no, yes, no, no, libc:no
port_in, tests/tis-user-guide/volatile_example.c, 4, unsigned char, NA, global variable, defined, no, yes, yes, no, libc:no
cursor, tests/tis-user-guide/volatile_example.c, 7, int, main, local variable, defined, no, no, no, no, libc:no
(Ordinarily, the output will contain other variables, but was filtered for brevity.)
The Const column also notes whether a variable can be modified by the analyzed
program which is relevant for distinguishing read-only volatile variables. Here,
port_in
is marked as constant.
The analyzer has to handle volatile variables conservatively to retain the soundness of the analysis—its ability to prove properties of program being true at run-time. Since volatile variables can be externally changed to any value at any point in the execution, the analyzer must over-approximate them.
In addition, while over-approximating the behavior of volatile variables makes the analysis sound, the loss of precision might make it less useful, especially if the user has specific knowledge about how these volatile variables behave in practice. In response, the analyzer provides tools to specify a specification for the behavior of volatile variables, or even to ignore the volatility of variables completely. The user can also add volatility to non-volatile variables.
When volatile variables are treated as non-volatile or if they are assumed to follow specific semantics, the analysis becomes unsound in the general case. That is, the analyzer will follow assumptions provided by the user, and if these assumptions are in some way incorrect, there may occur an execution of the analyzed program that causes undefined behavior, and that behavior might not be found by the analyzer. Therefore, the user must exercise utmost caution when specifying volatile behaviors.
The behavior of volatile variables during analysis with various parameters, and their impact on soundness are summarized in the table below. The remainder of this guide goes into the detail of these analyses and these options.
Analysis | Behavior | Soundness |
---|---|---|
Value (analyzer profile) | approximate value to full range | sound |
Value (interpreter profile) | halt analysis on volatile read | sound |
Value with remove-volatile-locals |
ignore volatile modifier on specific variables |
unsound |
Value with remove-volatile |
ignore volatile modifier on all variables |
unsound |
Value with volatile-globals |
add volatile modifier to specific variables |
sound |
WP | approximate value to full range | sound |
WP with wp-volatile |
approximate value to full range | sound |
WP without wp-volatile |
ignore volatile modifier on all variables |
unsound |
Any analysis with volatile plugin | replace volatile accesses with function calls |
unsound |
Since volatile variables can be modified externally to the analyzed program, value analysis handles volatile variables by making the conservative assumption that they always contain an unknown value, irrespective of what the analyzed program does. This over-approximation preserves the soundness of the analysis.
Example The following program exhibits a common use case for the use of the
volatile
keyword, where a variable represents a hardware register or a
sensor, so its value cannot be modified by the program but changes due to
external factors. This particular program declares a global variable called
sensor
which is qualified with const
and volatile
keywords, and is
initially undefined. The program then loops until the value of sensor
is
set.
#include <tis_builtin.h>
#include <unistd.h>
const volatile unsigned char sensor;
int main(void) {
while (!sensor) {
sleep(1);
}
tis_show_each("sensor", sensor);
return 0;
}
Running the example with value analysis shows sensor
is assumed to be
initialized and its value is approximated as the entire range of type unsigned
char
.
$ tis-analyzer -val volatile.c
¶[value] Called tis_show_each({{ "sensor" }}, [0..255])
Note that the value of the volatile variable remains approximated to the full range of its type, even if it is assigned during the execution of the program.
Example For instance, the following program declares a volatile
local
variable inside the function main
whose value is set initially to 0
, and
subsequently set to 1
.
#include <tis_builtin.h>
int main(void) {
volatile int x = 0;
tis_show_each("before", x);
x = 1;
tis_show_each("after", x);
return x;
}
However, after each assignment, the analyzer continues to approximate the value
of the variable to any integer value, assuming that the value of x
could be
changed externally.
$ tis-analyzer -val volatile_local.c
¶[value] Called tis_show_each({{ "before" }}, [-2147483648..2147483647])
[value] Called tis_show_each({{ "after" }}, [-2147483648..2147483647])
When the analyzer is run with the interpreter profile, it follows a single execution path and avoids over-approximation. The abstract interpreter specifically requires that variables be associated with a single value at each point during the execution of the program. Therefore, the interpreter cannot just approximate volatile variables as having any value within their range.
On the other hand, since values of volatile variables may be modified externally, the interpreter cannot assume the values of such variables to be known precisely without the loss of soundness. Therefore, the interpreter halts when encountering an access to a volatile variable in its execution path.
Example The following example is similar to the one from the previous
section. It also shows a popular use of the volatile
keyword for a sensor or
hardware register, where the value of a variable sensor
is set externally
but cannot be modified within the program. This program assigns sensor
the
initial variable of 255
for convenience.
#include <tis_builtin.h>
#include <unistd.h>
const volatile unsigned char sensor = 255;
int main(void) {
while (!sensor) {
sleep(1);
}
tis_show_each("sensor", sensor);
return 0;
}
When this program is interpreted, it produces two warnings and an error. The
first warning informs that the initialization is ignored, since the value of
volatile variables may change at any point. The second warning informs that the
value of a volatile variable cannot be used to evaluate any computation in
interpreter mode. The error informs that interpretation cannot proceed past the
attempt to evaluate the volatile variable. The interpreter does not evaluate
tis_show_each
to show the value of sensor
.
$ tis-analyzer -val --interpreter volatile_interpreter.c
¶tests/tis-user-guide/volatile_interpreter.c:4:[value] warning: global initialization of volatile variable sensor ignored
tests/tis-user-guide/volatile_interpreter.c:6:[value] warning: The following sub-expression cannot be evaluated
(due to volatile type, try option -remove-volatile):
sensor
All sub-expressions with their values:
unsigned char sensor ∈ [0..255]
Stopping.
[value] user error: Degeneration occurred:
results are not correct for lines of code that can be reached from the degeneration point.
It is possible to proceed with the interpretation of a program that reads from volatile variables by treating them as non-volatile. This is done by specifying the -remove-volatile or -remove-volatile-locals command-line flags. This can be especially useful when using value analysis with the interpreter profile to prevent it from halting on a read from a volatile variable (see above).
remove-volatile
option¶Setting the remove-volatile
option causes value analysis to be conducted as
if all volatile variables where not volatile. Specifically, the analyzer assumes
that variables cannot be modified outside the scope of the analyzed program,
regardless of whether they are marked volatile
. Since this assumption might
not be borne out in practice, this mode of analysis is unsound and may not find
all UBs.
Warning
Using remove-volatile
on programs reading volatile variables is unsound in the general case.
This option can be set by using the -remove-volatile command-line flag:
$ tis-analyzer -val -remove-volatile …
The feature can also be turned on within a JSON analysis configuration file
using the remove-volatile
Boolean option (see Configuration files):
{
"val": true,
"val-profile": "interpreter",
"remove-volatile": true
}
Example The following program is the same as the previous section. It
contains an example use of a volatile variable that is only set externally and
cannot be modified within the program. The program defines a variable called
sensor
with constant
and volatile
qualifiers and assigns the initial
value of 255
to it.
#include <tis_builtin.h>
#include <unistd.h>
const volatile unsigned char sensor = 255;
int main(void) {
while (!sensor) {
sleep(1);
}
tis_show_each("sensor", sensor);
return 0;
}
The previous section shows that interpreting this example using value analysis
yields an error when sensor
is read. However, when run with the
-remove-volatile
flag, the interpreter ignores the volatile
modifier on
sensor
and proceeds to analyze the program as if all modifications to
sensor
could be tracked by interpreting the program’s execution.
$ tis-analyzer -val --interpreter -remove-volatile volatile_interpreter.c
¶[value] Called tis_show_each({{ "sensor" }}, {255})
remove-volatile-locals
option¶The remove-volatile-locals
is a variant of the remove-volatile
option
that removes volatility only from volatile variables defined within specific
functions. That is, this option causes value analysis to be conducted as if
volatile variables within a specific set of functions where not volatile.
Specifically, the analyzer assumes that the volatile variables declared inside
the given functions cannot be modified outside the scope of the analyzed
program. As with remove-volatile
, since this assumption might not be born
out in practice, the analysis is unsound in general.
Warning
Using remove-volatile-locals
on programs reading volatile variables is
unsound in the general case.
This feature can be used via the
-remove-volatile-locals command-line option.
Here, the option specifies that local variables declared within functions
main
and test
should be treated as non-volatile:
$ tis-analyzer -val -remove-volatile-locals main,test …
The feature can also be turned on within a JSON analysis configuration file
using the remove-volatile-locals
option (see Configuration files).
Here, the option specifies a list of functions by analogy to the command-line
option above:
{
"val": true,
"val-profile": "interpreter",
"remove-volatile-locals": ["main", "test"]
}
Example The following example mirrors that of the preceding section, except
the variable sensor
is declared locally within the main
function rather
than globally. The program again defines a variable called sensor
with
constant
and volatile
qualifiers and the initial value of 255
.
#include <tis_builtin.h>
#include <unistd.h>
int main(void) {
const volatile unsigned char sensor = 255;
while (!sensor) {
sleep(1);
}
tis_show_each("sensor", sensor);
return 0;
}
Again, interpreting this example using value analysis yields a warning informing that the volatile variable cannot be read and halts.
$ tis-analyzer -val --interpreter -remove-volatile-locals volatile_locals.c
¶tests/tis-user-guide/volatile_locals.c:6:[value] warning: The following sub-expression cannot be evaluated
(due to volatile type, try option -remove-volatile):
sensor
All sub-expressions with their values:
unsigned char sensor ∈ [0..255]
Stopping.
[value] user error: Degeneration occurred:
results are not correct for lines of code that can be reached from the degeneration point.
However, when executed with the option remove-volatile-locals
set to
main
, the interpreter treats all local variables within main
as
non-volatile, even if they have the volatile
qualifier. Thus, the analysis
proceed and returns the value of sensor
as 255
.
$ tis-analyzer -val --interpreter -remove-volatile-locals main volatile_locals.c
¶[value] Called tis_show_each({{ "sensor" }}, {255})
The remove-volatile-locals option only applies to variables declared within a given function, so if a global variable is used within such a function, the interpreter cannot handle it. For example:
$ tis-analyzer -val --interpreter -remove-volatile-locals main volatile_interpreter.c
¶tests/tis-user-guide/volatile_interpreter.c:4:[value] warning: global initialization of volatile variable sensor ignored
tests/tis-user-guide/volatile_interpreter.c:6:[value] warning: The following sub-expression cannot be evaluated
(due to volatile type, try option -remove-volatile):
sensor
All sub-expressions with their values:
unsigned char sensor ∈ [0..255]
Stopping.
[value] user error: Degeneration occurred:
results are not correct for lines of code that can be reached from the degeneration point.
If a function uses a combination of globally-defined volatile variables and
local ones, the user should use a combination of remove-volatile
and
remove-volatile-locals
to achieve the desired effect.
By default, volatile variables behave the same in WP as they do in value analysis: the volatile value is assumed to be capable of being modified at any point in the execution of the analyzed program regardless of the code of the program. This preserves the soundness of the analysis.
Example Consider the following function and its ACSL contract. Here,
decrement_counter
checks whether the global variable counter
has a value
greater than 0
, and decreases it by 1
if it does. If the function
managed to successfully decrease the value of counter
, it returns 0
,
otherwise it returns 1
. The contract reflects this by specifying that
counter
and the result of the function both change depending on the value of
counter
, and by defining two separate behaviors: one for when counter
is
greater than 0
and for when it is 0
.
unsigned char counter;
/*@ assigns counter \from counter;
assigns \result \from counter;
behavior can_decrement:
assumes counter > 0;
ensures counter == \old(counter) - 1;
ensures \result == 0;
behavior cannot_decrement:
assumes counter <= 0;
ensures counter == \old(counter);
ensures \result == 1;
*/
int decrement_counter() {
if (counter > 0) {
counter--;
return 0;
}
return 1;
}
Running WP on this example shows that the all properties can be successfully checked:
$ tis-analyzer -wp -wp-rte -no-tis-libc wp.c
¶[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_can_decrement_post_2 : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post_2 : Valid
[wp] Proved goals: 8 / 8
Qed: 6
Alt-Ergo: 2 (24)
(Timing information was removed.)
Then, imagine that the presented program is meant to communicate with a
peripheral device via the counter
variable, both potentially reading and
writing from it. This is represented by the following modification to the
example:
📎 volatile_wp.c
[excerpt]
¶volatile unsigned char counter;
When the example is modified so that counter
is volatile, the WP analysis
fails. This is because the analyzer cannot assume that the value of counter
does not change spontaneously, and therefore cannot prove that the value of
counter
after executing the function would be either the same or less by
exactly one than its value before the function call.
$ tis-analyzer -wp -wp-rte -no-tis-libc volatile_wp.c
¶[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Timeout
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post_2 : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_cannot_decrement_post : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_cannot_decrement_post_2 : Timeout
[wp] Proved goals: 3 / 8
Qed: 3
Alt-Ergo: 0 (interrupted: 5)
(Timing information was removed.)
An interested user can investigate the details of the failing conditions by running the following:
The behavior of the WP analysis can be modified to ignore the volatile
modifier on all variables. This is done by turning off the wp-volatile
option (it is turned on by default). Since there is no guarantee that the
user’s assumption about volatile variables remaining unchanged is true in the
general case, the analysis becomes unsound.
Warning
Using wp-no-volatile
on programs reading volatile variables makes the
analysis unsound in the general case.
The option can be turned off via the command-line by setting the -wp-no-volatile flag.
$ tis-analyzer -wp -wp-rte -wp-no-volatile …
Alternatively, the option can be unset via a JSON configuration file with with the Boolean option wp-volatile:
{
"wp": true,
"wp-rte": true,
"wp-volatile": false
}
Example Running the example above with the wp-volatile
option turned off
means the volatile
keyword is ignored and all the properties are successfully
confirmed. However, the analyzer also emits warnings whenever a property
involves an access to a volatile variable, in effect informing that the analysis
to be unsound.
$ tis-analyzer -wp -wp-rte -no-tis-libc -wp-no-volatile volatile_wp.c
¶[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe write-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe read-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:24:[wp] warning: unsafe read-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:15:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:15:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:14:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:20:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:20:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:19:[wp] warning: unsafe volatile access to (term) l-value
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_can_decrement_post_2 : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post_2 : Valid
[wp] Proved goals: 8 / 8
Qed: 6
Alt-Ergo: 2 (24)
By default, the analyzer treats variables as volatile only if they are
explicitly declared with the volatile
qualifier in the source code. However,
in some cases it may be beneficial to promote other variables to be volatile,
even if they are not declared as such. This can be used to simulate concurrency
or to conform to a de facto usage without modifying the source code.
The volatile-globals
option allows to indicate a set of global variables
that value analysis has to consider volatile despite them not being declared as
such in the program.
The feature can be turned on via the
-volatile-globals command-line option, passing in a
list of global variables. Here, the analyzer will treat the variables sensor
and port
as volatile, regardless of whether they are declared volatile
in the source code:
$ tis-analyzer -val -volatile-globals sensor,port
Alternatively, the feature can be turned on within a JSON configuration using the volatile-globals option (see Configuration files). Here, the option specifies a list of variables by analogy to the command-line option above:
{
"val": true,
"volatile-globals": ["sensor", "port"]
}
Example Consider the following program containing two global variables:
port_in
and port_out
, neither of which are declared with the
volatile
qualifier. The program writes 1
or 0
to port_out
,
depending on whether port_in
is 0
or not.
#include <tis_builtin.h>
unsigned char port_in;
unsigned char port_out;
void main(void) {
if (port_in == 0) {
port_out = 1;
} else {
port_out = 0;
}
tis_show_each("port_in port_out", port_in, port_out);
}
Upon analysis, since neither port_in
nor port_out
are volatile, the
analyzer assumes they are both initialized to 0
and their value remains
unchanged until the program modifies port_out
. Therefore, the analysis shows
the values of port_in
and port_out
as 0
and 1
at the end of
function main
.
$ tis-analyzer -val volatile_globals.c
¶[value] Called tis_show_each({{ "port_in port_out" }}, {0}, {1})
On the other hand, since the program can be deduced to be using those variables
for communication, the analyzer can be instructed to treat them as volatile. In
that case, the analyzer cannot predict the values of port_in
and
port_out
even after port_out
is modified by the program, because either
variable can be modified externally at any point in the execution.
$ tis-analyzer -val -volatile-globals port_in,port_out volatile_globals.c
¶[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
The volatile-globals
option also allows to indicate a range of absolute
memory addresses that value analysis has to consider volatile. This is done by
providing the option with NULL
as an argument, and defining a range of valid
memory addresses via the absolute-valid-range
option. This allows modeling
MMIO with such memory ranges. See Physical addresses for details.
The absolute volatile address range can be specified via the command-line using the -volatile-globals and -absolute-valid-range options. Here, the analyzer is given a valid address range starting at 0x1000 and ending at 0x2000 (inclusive), specified to be treated as volatile.
$ tis-analyzer -val -volatile-globals NULL -absolute-valid-range 0x1000-0x2000 …
The same analysis parameters can be specified via a JSON configuration using the volatile-globals and absolute-valid-range options (see Configuration files):
{
"val": true,
"volatile-globals": [ "NULL" ],
"absolute-valid-range": "0x1000-0x2000"
}
The absolute valid range is declared volatile in its entirety or not at all. The analyzer does not currently support treating some parts of the absolute valid range as volatile and others as non-volatile
Tip
Representing absolute addresses as equivalent variables gives more flexibility in this regard. It also provides other advantages and is the recommended approach overall. See the guide on physical addresses for details on transforming a program from using absolute addresses to using variables pinned to specific address ranges.
Example Consider a program that communicates with hardware via an area of
memory defined in terms of absolute addresses. The address space is a 1-byte
area that starts at 0x1000
, as defined by the constant PORT
. Values are
read and written to this memory area via the macro VALUE
which interprets a
byte at a specific address as an unsigned char
. The program reads the value
at PORT
and replies with 0
or 1
depending on the received value.
#include <tis_builtin.h>
#define PORT 0x1000
#define VALUE(port) *((unsigned char *) port)
void main(void) {
tis_show_each("PORT", VALUE(PORT));
if (VALUE(PORT) == 0) {
VALUE(PORT) = 1;
} else {
VALUE(PORT) = 0;
}
tis_show_each("PORT", VALUE(PORT));
}
Since the example refers to an absolute address, running the analyzer on this program yields an alarm informing that the read is out of bounds.
$ tis-analyzer -val volatile_range.c
¶tests/tis-user-guide/volatile_range.c:6:[kernel] warning: out of bounds read. assert \valid_read((unsigned char *)0x1000);
Hence, analyzing the example requires specifying that the absolute memory
addresses used by the program are valid via the absolute-valid-range
option.
Here, the range of addresses starting at 0x1000
and ending at 0x1001
(inclusive) is specified as valid. The analysis then treats the range as valid
and containing any values. The analysis then reports the value read from
PORTS
to be anything in the range of [0..255]
. The value of PORTS
after the program writes to it either 0
or 1
.
$ tis-analyzer -val -absolute-valid-range 0x1000-0x1001 volatile_range.c
¶[value] Called tis_show_each({{ "PORT" }}, [0..255])
[value] Called tis_show_each({{ "PORT" }}, {0; 1})
However, if the program uses PORT
to communicate with an external entity,
as is the intention, the values within PORT
could be externally modified at
any point in the execution of the program. Soundness requires that this be
reflected in the analysis. Thus, the memory range used for communication should
be treated as volatile. This is accomplished by using the volatile-globals
option. When NULL
is passed as its argument, the option treats the entire
valid address range as volatile. Then, the value of PORTS
is also
correctly reported as any value in range [0..255]
, even after it was
written to within the program.
$ tis-analyzer -val -absolute-valid-range 0x1000-0x1001 -volatile-globals NULL volatile_range.c
¶[value] Called tis_show_each({{ "PORT" }}, [0..255])
[value] Called tis_show_each({{ "PORT" }}, [0..255])
While adding volatile behavior to global variables or address ranges allows to indicate accesses to peripherals via MMIO, it does not model the behavior of the peripheral itself. Instead the analyzer assumes that a volatile variable can be changed to any value at any time. The volatile plugin allows the user to simulate the exact behavior of hardware by replacing accesses to volatile variables with function calls. The semantics of those function calls can then be defined via ACSL properties. This allows the analyzed software to be studied in conditions that resemble final deployment conditions.
The analyzer assumes that the functions replacing accesses to volatile variables correctly and completely describe the operation of peripherals. The analysis is sound only under those conditions, and unsound if they are not borne out.
Warning
Using the volatile plugin on programs reading volatile variables is unsound in the general case.
When using the volatile plugin, the user prepares a specification of the
behavior of a volatile variable. The specification takes the form of function
signatures that would replace reading and writing the variable with contracts
specifying their behavior. Given a volatile variable v
of some type T
, the
read and write functions must have the following signatures.
T rd_v(volatile T *ptr);
T wr_v(volatile T *ptr, T value);
(The names used in function signatures are arbitrary and different names can be provided by the user.)
These functions can be defined in C in full, in which case their bodies describe their behavior to the analyzer. Defining these functions in this way can be especially useful when the source code programming the peripheral is already available as C source code.
T rd_v(volatile T *ptr) {
// …
}
T wr_v(volatile T *ptr, T value) {
// …
}
Alternatively, if the C code defining the behavior of the peripheral is not available or is too complex, the behavior of read and write accesses can be described declaratively using ACSL properties. This is the recommended way of defining the behavior of volatile variables and it is described in detail in following sections.
/*@ requires ptr == &v;
@ ensures \result==…;
@ assigns \result \from …;
@ …
@*/
T rd_v(volatile T *ptr);
/*@ requires ptr == &v;
@ ensures \result==…;
@ assigns \result \from …;
@ …
@*/
T wr_v(volatile T *ptr, T value);
Once the the behavior specification is in place, the volatile plugin needs to be
informed that a specific variable should be replaced by function calls. The user
does this by adding an additional ACSL annotation to the source code. The
annotation specifies the name of the volatile variable that will be replaced,
the name of a read function (after the reads
keyword), and the name of a
write function (after the writes
keyword).
//@ volatile v writes wr_v reads rd_v ;
While it is recommended to supply both a read and a write specification, the configuration can omit either. In those cases, the volatile variable is read from or written to directly, depending on which function is missing.
If this happens, the analyzer also generates a warning informing that either the read or write access function was not defined for a volatile value. This warning can be turned off via the -no-warning-on-lvalues-partially-volatile command-line flag.
The analyzer also warns if there are volatile variables for which no
volatile
annotation is given at all. If this is not an oversight, the warning
can be turned off via the
-no-warning-on-volatile-lvalues flag.
Tip
ACSL annotations can only use symbols that were already defined. If you are using a symbol in your annotation that is not defined or is defined in the source code after the annotation, the analyzer cannot proceed and you receive the following error:
[kernel] user error:: cannot find function '…' for volatile clause
Once replacement functions are specified and their behavior is defined, the plugin is ready for use. It is applied via the -volatile command-line flag.
$ tis-analyzer … -volatile
When the flag is set, the analyzer transforms the analyzed source code into a
new project called Volatile
, where volatile accesses are replaced with
function calls according to the provided specification. The analyzer can then be
configured to perform further analyses on the Volatile
project via the
-then-on sequencing option.
$ tis-analyzer … -volatile -then-on Volatile …
Tip
The modified source code of the Volatile
project is not output by the
analyzer. To view the modified source code (normalized), run an analysis on
the Volatile
project with -print
:
$ tis-analyzer … -volatile -then-on Volatile -print
Or, to show only functions and variables of interest, with -print
and
-print-filter
options:
$ tis-analyzer … -volatile -then-on Volatile -print -print-filter main,v,wr_v,rd_v
An equivalent JSON configuration is not currently supported (See Command line options).
Example Consider this program communicating with a peripheral device via a
pair of volatile variables called port_in
and port_out
. The peripheral
device acts as a buffer. The program writes bytes into port_out
until the
peripheral signals it to stop by writing 1
back to port_in
, which it
will do after it receives 4 bytes. Given that the actions of the peripheral
device are external and obscured by the volatile variable, the example initially
looks like this:
#include <tis_builtin.h>
volatile unsigned char port_out;
const volatile unsigned char port_in;
int main(void) {
unsigned char data[] = { 0, 1, 2, 3 };
int cursor = 0;
while (!port_in) {
port_out = data[cursor++];
}
tis_show_each("cursor", cursor);
tis_show_each("port_in port_out", port_in, port_out);
}
Even though the programmer might have knowledge about how the peripheral
device behaves, the analyzer still currently treats it as any possible value
allowed by its type and emits an alarm that cursor
will index the data
array out of its bounds when it reaches 4
:
$ tis-analyzer -val -slevel 100 volatile_example.c
¶tests/tis-user-guide/volatile_example.c:9:[kernel] warning: accessing out of bounds index {4}. assert tmp < 4;
(tmp from cursor++)
[value] Called tis_show_each({{ "cursor" }}, {0})
[value] Called tis_show_each({{ "cursor" }}, {1})
[value] Called tis_show_each({{ "cursor" }}, {2})
[value] Called tis_show_each({{ "cursor" }}, {3})
[value] Called tis_show_each({{ "cursor" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
The following snippet extends the example above with a description of the behavior of the volatile variables in the form of C code.
The state of the port_out
variable is simulated by two new variables: the
byte last_written
representing the last written value, and the integer
cursor
informing how many elements were written into the buffer. The
behavior of port_out
is described by the functions wr_port_out
and
rd_port_out
. When a value is written to port_out
, it becomes
last_value
, cursor
is incremented. If there is no more room in
the buffer the write is ignored. Reading from port_out
returns the most
recently successfully written value from last_value
.
The behavior of port_in
is simulated by the function rd_port_in
.
Reading from read_in
causes returns 0
if there is room in the buffer and
1
if it is full. Writing to port_in
is not supported, so no function is
provided. In order not to produce a warning about this, the analyzer will run
with the -no-warning-on-lvalues-partially-volatile
flag set.
Both volatile variables are connected to their read and write functions by their
respective volatile
ACSL annotations.
#include <tis_builtin.h>
volatile unsigned char port_out;
const volatile unsigned char port_in;
#define BUFFER_LEN 4
unsigned char last_value;
int cursor = 0;
unsigned char rd_port_out(unsigned char volatile *ptr) {
return last_value;
}
unsigned char wr_port_out(unsigned char volatile *ptr, unsigned char value) {
int buffer_full = cursor >= BUFFER_LEN;
if (!buffer_full) {
last_value = value;
cursor++;
}
return last_value;
}
const unsigned char rd_port_in(const unsigned char volatile *ptr) {
return cursor >= BUFFER_LEN;
}
//@ volatile port_out reads rd_port_out writes wr_port_out;
//@ volatile port_in reads rd_port_in;
int main(void) {
unsigned char data[] = { 0, 1, 2, 3 };
int i = 0;
while (!port_in) {
port_out = data[i++];
}
tis_show_each("i", i);
tis_show_each("port_in port_out", port_in, port_out);
}
Applying the volatile
module causes accesses to port_in
and port_out
to be replaced with calls to the relevant functions:
$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_c.c -then-on Volatile -print -print-filter main
¶int main(void)
{
int __retres;
unsigned char data[4];
int i;
data[0] = (unsigned char)0;
data[1] = (unsigned char)1;
data[2] = (unsigned char)2;
data[3] = (unsigned char)3;
i = 0;
while (1) {
{
unsigned char __volatile_tmp;
__volatile_tmp = rd_port_in(& port_in);
if (! (! __volatile_tmp)) break;
}
{
int tmp;
{
tmp = i;
i ++;
wr_port_out(& port_out, data[tmp]);
}
}
}
tis_show_each("i", i);
{
unsigned char __volatile_tmp_9;
unsigned char __volatile_tmp_7;
__volatile_tmp_7 = rd_port_in(& port_in);
__volatile_tmp_9 = rd_port_out(& port_out);
tis_show_each("port_in port_out", (int)__volatile_tmp_7,
(int)__volatile_tmp_9);
}
__retres = 0;
__tis_globfini();
return __retres;
}
Then, using the resulting Volatile
projects allows the analyzer to be more
precise about the state of the volatile variables at each point in the program
execution. Here, it correctly predicts that at the end of the execution,
port_in
will contain the specific value of 1
, and port_out
the last
value written to it.
$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_c.c -then-on Volatile -val -slevel 100
¶[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})
While the example above defines the behavior by writing the read and write functions in C, the preferred alternative is to define that behavior declaratively by providing an ACSL contract to the function declarations. This prevents the need to model the behavior of peripherals in detail and is easier to handle for the analyzer.
When using ACSL at minimum, each function should establish a connotation between the pointer pass as argument and the volatile variable in the program:
/*@ requires ptr == &v; */
The functions can also provide the semantics of each function by defining their return values:
/*@ ensures \result==…; */
Functions returning values and functions with side effects should also specify dependencies of the modified data:
/*@ assigns \result \from … */
/*@ assigns … \from … */
For details on ACSL, see the ACSL properties section of the documentation.
Example This example re-writes the previous example to use ACSL annotations
to define function semantics for wr_port_out
, rd_port_out
, and
rd_port_in
. The principle of operation is the same, including using the same
additional variables to simulate state. These variables are used within the ACSL
annotations for each function to declare how calling the function impacts the
state of the peripheral.
The ACSL contract for function rd_port_out
specifies that it must be called
on the volatile variable port_out
and that it always returns the value of the
variable last_value
.
📎 volatile_example_acsl.c
[excerpt]
¶/*@ requires ptr == &port_out;
assigns \result \from last_value;
ensures \result == last_value;
*/
unsigned char rd_port_out(unsigned char volatile *ptr);
The contract for wr_port_out
also specifies that it must be called on
port_out
and that, apart from returning a value, it has side effects on
cursor
and last_value
. The function has two separate sets of
behaviors depending on whether the buffer is already full or not. If there is
still room, the function increments cursor
and sets last_value
from the argument value
. If the buffer is already full, the function does
not change cursor
. In either case the function returns the contents
of last_value
;
📎 volatile_example_acsl.c
[excerpt]
¶/*@ requires ptr == &port_out;
assigns \result \from last_value;
assigns cursor \from cursor;
assigns last_value \from value, last_value, cursor;
ensures \result == last_value;
behavior nonfull:
assumes cursor < BUFFER_LEN;
ensures cursor == \old(cursor) + 1;
ensures last_value == value;
behavior full:
assumes cursor == BUFFER_LEN;
ensures cursor == \old(cursor);
complete behaviors;
disjoint behaviors;
*/
unsigned char wr_port_out(unsigned char volatile *ptr, unsigned char value);
Finally, the contract for rd_port_in
specifies that this function must be
called on port_in
. This function returns 1
if the buffer is full, or
0
otherwise.
📎 volatile_example_acsl.c
[excerpt]
¶/*@ requires ptr == &port_in;
assigns \result \from \nothing;
ensures (\result == 1 && cursor >= BUFFER_LEN)
|| (\result == 0 && cursor < BUFFER_LEN);
*/
const unsigned char rd_port_in(const unsigned char volatile *ptr);
The remainder of the example remains unchanged. When it is run via the
volatile
plugin, it returns the expected results:
$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_acsl.c -then-on Volatile -val -slevel 100
¶[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})
Instead of defining the behavior of volatile variables by declaring or defining read and write functions for each of them individually, it may be more convenient to blanket define such functions for an entire type of volatile variables. This is done via the binding-auto feature of the Volatile plugin. This feature causes the Volatile module of the analyzer to attempt to bind the accesses to all volatile variables to appropriately named and typed functions, based on the type of the variables in question.
Warning
The volatile
plugin is an experimental feature of TrustInSoft Analyzer.
Specifically, given a volatile variable of some type T
defined as, the
binding-auto
feature replaces reads and writes to that variable with
calls to the following functions:
T c2fc2_Wr_T(T *ptr, T value);
T c2fc2_Rd_T(T *ptr);
(Note the capitalization of Wr
and Rd
.)
These functions have to be declared manually. The behavior of these functions can be defined by providing them with bodies written in plain C or it can be specified using ACSL.
Then, the feature is turned on within the Volatile plugin by setting the binding-auto command-line flag alongside the -volatile flag:
$ tis-analyzer … -volatile -binding-auto
When both of these flags are set, the Volatile module replaces volatile accesses
with function calls, if it can find an appropriately named function for a
specific volatile variable’s type. The replacement is put into effect only if
the function’s signature matches the signature expected for volatile accesses.
The replacement is also not effected if a different function is already
specified for a given variable via an volatile
ACSL annotation, in which
case the functions specified in the annotation are prioritized.
The same result can be obtained through a JSON configuration file, by setting the option binding-auto to true (see Configuration files):
{
"binding-auto": true
}
The functions used for volatile accesses start with a c2fc2_
prefix. This
prefix can be changed via the -binding-prefix command-line option:
$ tis-analyzer -volatile -binding-auto -binding-prefix "auto_"
Alternatively, the prefix can also be set by providing a string to the
binding-prefix
option through a JSON configuration file (see Configuration files):
{
"binding-auto": true,
"binding-prefix": "auto_"
}
Given that the automatic binding feature works by associating variables and
function implicitly and it relies on the right names and types of functions
being defined, it is easy to make a mistake while using it. However, the plugin
provides debug output that makes it clear which variables were bound to which
functions. The output is turned on by asking for binding
messages to be
printed via the -volatile-msg-key option. The
output of this option is only shown at debug
levels 2
or greater, so it
should always be used in conjunction with the -debug
command-line option. The output is available regardless of whether
binding-auto
is turned on or not.
$ tis-analyzer -volatile -binding-auto -debug=2 -volatile-msg-key=binding
Debug messages from the binding
process of the Volatile plugin are displayed
with the tag [volatile:binding]
, e.g.:
[volatile:binding] Looking for a function relative to write access to volatile left-value: var
[volatile:binding] Looking for a default binding from the type name: T volatile
[volatile:binding] Looking for function c2fc2_Wr_T
Note: the output of volatile-msg-key=binding
is available regardless of
whether binding-auto
is turned on or not.
Example Consider again the example from the previous section, but modified to use the binding-auto
feature to specify functions to use for accessing the variable port_out
.
Since the type of that variable is unsigned char
, the access functions
defined previously as wr_port_out
and rd_port_out
are now renamed to
auto_Wr_unsigned_char
and auto_Rd_unsigned_char
(the example uses the
custom prefix auto_
rather than the default c2fc2_
):
📎 volatile_example_auto.c
[excerpt]
¶/*@ requires ptr == &port_out;
assigns \result \from last_value;
ensures \result == last_value;
*/
unsigned char auto_Rd_unsigned_char(unsigned char volatile *ptr);
/*@ requires ptr == &port_out;
assigns \result \from last_value;
assigns cursor \from cursor;
assigns last_value \from value, last_value, cursor;
ensures \result == last_value;
behavior nonfull:
assumes cursor < BUFFER_LEN;
ensures cursor == \old(cursor) + 1;
ensures last_value == value;
behavior full:
assumes cursor == BUFFER_LEN;
ensures cursor == \old(cursor);
complete behaviors;
disjoint behaviors;
*/
unsigned char auto_Wr_unsigned_char(unsigned char volatile *ptr, unsigned char value);
The example also removes the ACSL annotation explicitly binding these functions
with port_out
. Meanwhile the replacement function for port_in
remains
unchanged, as does its volatile
ACSL annotation:
📎 volatile_example_auto.c
[excerpt]
¶/*@ requires ptr == &port_in;
assigns \result \from \nothing;
ensures (\result == 1 && cursor >= BUFFER_LEN)
|| (\result == 0 && cursor < BUFFER_LEN);
*/
const unsigned char rd_port_in(const unsigned char volatile *ptr);
When analyzing the code, with the Volatile plugin, with the binding-auto
flag, and with the binding-prefix
option set to auto_
, the accesses to
port_out
are replaced with auto_Wr_unsigned_char
and
auto_Rd_unsigned_char
, so the analyzer produces the expected results:
$ tis-analyzer -volatile -binding-auto -binding-prefix=auto_ -no-warning-on-lvalues-partially-volatile volatile_example_auto.c -then-on Volatile -val -slevel 100
¶[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})
Running the code with debug information shows the process of binding functions
to volatile variables. Variable port_in
simply uses the function defined for
it, while port_out
is bound to functions whose name was inferred from the
type name of unsigned char
:
$ tis-analyzer -volatile -debug 2 -volatile-msg-key=binding -binding-auto -binding-prefix=auto_ -no-warning-on-lvalues-partially-volatile volatile_example_auto.c -then-on Volatile -val -slevel 100
¶[volatile] Running volatile plugin...
[volatile] Processing volatile clauses...
[volatile] Building volatile table...
[volatile] Building new project with volatile access transformed...
[volatile:binding] Normalizing port_in into port_in
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_in
[volatile:binding] Function found: rd_port_in
[volatile] tests/tis-user-guide/volatile_example_auto.c:50 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_in
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to write access to volatile left-value: port_out
[volatile] Building default binding table...
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Wr_unsigned_char
[volatile:binding] Verifying prototype of function auto_Wr_unsigned_char: unsigned char (
unsigned char volatile *, unsigned char)
[volatile:binding] Function found: auto_Wr_unsigned_char
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to write access to volatile left-value: port_out
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Wr_unsigned_char
[volatile:binding] Verifying prototype of function auto_Wr_unsigned_char: unsigned char (
unsigned char volatile *, unsigned char)
[volatile:binding] Function found: auto_Wr_unsigned_char
[volatile] tests/tis-user-guide/volatile_example_auto.c:51 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms write access to volatile left-value: port_out
[volatile:binding] Normalizing port_in into port_in
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_in
[volatile:binding] Function found: rd_port_in
[volatile] tests/tis-user-guide/volatile_example_auto.c:54 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_in
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_out
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Rd_unsigned_char
[volatile:binding] Verifying prototype of function auto_Rd_unsigned_char: unsigned char (
unsigned char volatile *)
[volatile:binding] Function found: auto_Rd_unsigned_char
[volatile] tests/tis-user-guide/volatile_example_auto.c:54 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_out
Our references from the C11 standard here are:
va_list
type¶The type va_list
, declared in the <stdarg.h>
header, should be,
according to the C11 standard, a complete object type. There are some
limitations on TrustInSoft Analyzer’s capabilities of handling this type and its
correct use on the basic level.
Objects of the va_list
type are handled correctly only if they appear as
local variables, formal arguments, etc. We do not support global va_list
variables, arrays of va_list
objects nor va_list
as the type of fields
of complex types (i.e. unions or structures). A fatal error will occur if
va_list
objects are used in such a way (however if they are just declared
and then not actually used there is no error).
It is not entirely clear (in the C11 standard) if performing assignments on
variables of va_list
type (i.e. ap1 = ap2;
) is permitted or not.
However, as both gcc and clang compilers refuse to compile such operations,
we assume that they are not permitted and thus we do not handle them.
TrustInSoft Analyzer does not verify though if va_list
assignments
appear in the code: if they do, the program will not be rejected, even
though the behavior is undefined.
va_list
type¶Casting from other data types to the va_list
type is implicitly forbidden
and this rule is enforced by TrustInSoft Analyzer: an appropriate error will occur if
such a cast is encountered and the program will be rejected.
va_list
type¶Casting from the va_list
type to other data types is also forbidden, though
we do not enforce it: programs containing such cases are incorrect, but they
will not be rejected by TrustInSoft Analyzer.
va_list
objects to other functions¶The va_list
objects can be passed as arguments to other functions. However,
if the called function invokes the va_arg
macro on this object, its value in
the calling function becomes indeterminate (i.e. it cannot be used anymore for
va_arg
in the calling function). This rule does not apply if va_list
is
passed by a pointer. See subsection 7.16.1 (of the C11 standard).
We do not enforce this rule in TrustInSoft Analyzer. All va_list
objects
passed to other functions are treated as if they were passed by pointer. Each
invocation of the va_arg
macro on such a va_list
object will simply
return the subsequent argument, without considering where the previous
va_arg
invocations happened.
This approach is similar to what is implemented in gcc and clang.
va_list
objects from functions¶It is not completely clear (in the C11 standard), but it seems that functions
returning va_list
objects are allowed. However, as both gcc and clang
compilers refuse to compile such functions, we assume that using them is not
permitted. TrustInSoft Analyzer rejects programs which declare functions returning
va_list
objects.
Some undefined behavior cases enumerated in the C11 standard, see section J.2 (Undefined behavior) of appendix J (Portability issues), are not verified by TrustInSoft Analyzer.
va_arg
on va_list
objects passed to other functions¶The macro va_arg is invoked using the parameter ap that was passed to a function that invoked the macro va_arg with the same parameter (7.16).
Status: Not verified at all (as stated above).
A macro definition of va_start, va_arg, va_copy, or va_end is suppressed in order to access an actual function, or the program defines an external identifier with the name va_copy or va_end (7.16.1).
Status: Not verified at all.