Tip

Use your browser search engine (such as Ctrl+F) to search in the entire documentation.

TrustInSoft Analyzer Documentation

TrustInSoft Analyzer is an award winning static C and C++ source code analyzer that ensures the safety and security of the source code by providing mathematical/formal guarantees over program properties.

TrustInSoft Analyzer takes advantage of state-of-the-art technology to provide different sets of tools to facilitate the analysis of C and C++ programs.

Structure of the Documentation

  • The Tutorials section helps the user to get started with TrustInSoft Analyzer
  • The User Manual section provides methodologies to use TrustInSoft Analyzer
  • The Reference Manual section details exhaustively the features of TrustInSoft Analyzer
  • The Coding Standards section details the usage of TrustInSoft Analyzer with standard requirements
  • The Glossary section gives the definitions about the terms used in this documentation
  • The Useful Links section references external links related to TrustInSoft Analyzer

Get Started

Start with the Tutorials to learn how to use TrustInSoft Analyzer, browse the source code and the alarms in the GUI to eventually have the guarantee of the absence of Undefined Behavior.

TrustInSoft Analyzer: non-technical overview

Tutorials

This section provides step-by-step tutorials to help you get started with TrustInSoft Analyzer.

The goal of TrustInSoft Analyzer is to prevent runtime errors by analyzing all of the possible values that the variables can take at any given point in the program, in order to prove that none of the execution paths leads to a problem (such as an undefined behavior or a forbidden operation).

This verification is called the value analysis.

Note

Unlike testing or binary analysis, the value analysis provided by TrustInSoft Analyzer is exhaustive: the guarantees provided apply to all the concrete executions of the program. Even the tests with the best coverage will only test a few execution paths in a program, whereas binary analysis is strongly dependent on the compilation and execution environments. This is why static value analysis gives stronger guarantees than both testing and binary analysis.

The value analysis will try to prove all the properties needed for the code to be correct. If a property can not be automatically proved, the analyzer will emit an alarm, such as:

/*@ assert Value: division_by_zero: b ≢ 0; */

It means that, in order for the program to be correct, the value of b needs to be non-zero at the point in the execution pointed by the analyzer.

At this point there are two possibilities:

  1. There is an execution path for which b = 0 that will lead to an error or an undefined behavior.

    This means there is a bug in the program that needs to be corrected in order to ensure that the property will hold.

  2. There is no execution path for which b = 0, but the analyzer was not able to prove the validity of the property.

    This means that the analyzer over-approximates the possible values of b, and in order to make the alarm disappear, it will be necessary to guide the analyzer to be more precise on the possible values of b, and then run the analysis again.

Tip

An alarm is a property, expressed in a logic language, that needs to hold at a given point in the program in order for the program to be correct.

Getting Started

The following examples show how to use TrustInSoft Analyzer on test snippets to eventually guarantee the absence of undefined behavior for the input values of the test.

Tip

When used to verify tests, TrustInSoft Analyzer can be used with the option --interpreter so that no other special tuning is required.

This allows the analyzer to remain precise, and thus, each alarm is a true bug that needs to be fixed.

Example 1: Out of Bounds Read

The example array.c, is located in the directory /home/tis/1.42/C_Examples/value_tutorials/getting_started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i <= 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Open the GUI after starting the analysis with the following command:

$ tis-analyzer-gui --interpreter /home/tis/1.42/C_Examples/value_tutorials/getting_started/array.c
_images/tuto-c-example-array-1.png

TrustInSoft Analyzer emits the following alarm:

/*@ assert Value: index_bound: i < 5; */

Tip

Meaning of the property: for the program to be valid, the value of i must be strictly less than 5 when used to access array[i].

To see both values of i and array, right click on i and Track this term in the statement printf("array[%d] = %d\n", i, array[i]); highlighted in orange, then click on array on that same statement. To see the values at each iteration of the loop, click on Per Path and scroll down to the last raw.

_images/tuto-c-example-array-2.png

The value of i is successively equal to:

  • 0 at the first iteration of the loop,
  • 1 at the second iteration of the loop,
  • 2 at the third iteration of the loop,
  • 3 at the fourth iteration of the loop,
  • 4 at the fifth iteration of the loop,
  • 5 at the sixth iteration of the loop,
  • and array is an array local var. (int[5])

Accessing array[i] when i is equal to 5 is an access out of the bounds.

In order to continue the analysis, the program must be fixed and the analysis run again.

Note

Indeed, not all statements are analyzed, see the statements highlighted in red.

The statements after the loop are highlighted in red because they are unreachable with respect to the input values of the program. This means that none of the computed value allows to continue after the loop with a well-defined execution of the program.

Understanding the Root Cause to Fix the Program

Tip

Now that we know that when i is equal to 5 there is an access out of the bounds of array, we will ask the analyzer where does this value come from.

Click on the button Inspect i to see the last write to the left-value i according to the current statement, and the value of i at this statement.

Tip

The button Inspect i is equivalent to the following actions:

  • Right click on i and then Show Defs
  • Click on i to see the values
_images/tuto-c-example-array-3.png _images/tuto-c-example-array-4.png

In the Interactive Code, the two statements highlighted in green show the last writes to i, moreover, the Show Defs panel shows that these two write locations are the only one. The location i = 0 correspond to the declaration and initialization, while the location i++ is where the value of i is eventually 5.

We fix the program by changing the conditional condition to exit the loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i < 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Kill the current analysis, fix the source code and then run again the analysis.

Tip

When analyzing real life project, and when the source code must be fixed, it is recommended to follow the process below:

  1. Kill the current analysis
  2. Edit the source code to fix the issue
  3. Run again the analysis

However, for simple use case, such as this one, everything can be done from the GUI, following this process:

  1. Edit the source code in the Source Code panel located on the right area
  2. Click on the Save File button at the top of the same panel
  3. Click on the Parsing button in the Overview panel located at the top left area
  4. Click Value Analysis button in the same panel
_images/tuto-c-example-array-5.png

TrustInSoft Analyzer emits the same alarm, but at a different location:

/*@ assert Value: index_bound: i < 5; */

Tip

Meaning of the property: for the program to be valid, the value of i must be strictly less than 5 when used to access array[i].

To see both values of i and array, right click on i and Track this term, then click on array.

_images/tuto-c-example-array-6.png

The value of i is equal to 5, and array is an array local var. (int[5]). Accessing array[i] when i is equal to 5 is an access out of the bounds.

Once again, we fix the program to print the last element of the array, kill the current analysis and then run again the analysis.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i < 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    i--;
    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Tip

TrustInSoft Analyzer guarantees the absence of undefined behavior for the input values of this test.

_images/tuto-c-example-array-7.png

Example 2: division by zero

In this example we will study the program example1.c, located in the /home/tis/1.42/C_Examples/TutorialExamples directory. This program performs some operations on unsigned integers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
    int sum, prod, quot, diff;

    sum = a + b;
    diff = a - b;
    prod = a * b;
    quot = a / b;
}

Start the value analysis (option -val) with the following command:

$ tis-analyzer-gui -val /home/tis/1.42/C_Examples/TutorialExamples/example1.c

and launch the GUI (as explained in the Getting Started section).

Your browser window should display a division by zero alarm:


_images/tis-qs_examples_example1_1.png

Selecting the alarm in the bottom panel will highlight the program point at which the alarm was raised, both in the Source Code Window (right panel) and in the Interactive Code Window, where you can also see the ACSL assertion generated by the analyzer:

/*@ assert Value: division_by_zero: b ≢ 0; */

The program takes two unsigned integer arguments a and b that are unknown at the time of the analysis. The value analysis must ensure that the program is correct for all possible values of a and b. The alarm was raised because if b is equal to zero, there will be a division by zero, which leads to undefined behavior. This means that anything can happen at this point, making the program incorrect (and dangerous to use, as this will lead to a runtime error).

Let’s modify the program by adding an if (b != 0) statement in order to perform the division only if b is non-zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
//Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
    int sum, prod, quot, diff;

    sum = a + b;
    diff = a - b;
    prod = a * b;
    if (b != 0)
        quot = a / b;
}

and launch the analysis on the modified version example1_ok.c:

$ tis-analyzer-gui -val /home/tis/1.42/C_Examples/TutorialExamples/example1_ok.c
_images/tis-qs_examples_example1_2.png

Congratulations! All the alarms have disappeared, so the program is guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer. This means that it will run safely, whatever the arguments provided.

You can now move to the next series of examples.

Example 3: dangling pointers

Compiling well defined, standard-compliant C or C++ source code should ensure that your program will behave as intended when executed, whatever the platform or compilation chain used.

TrustInSoft Analyzer’s static code analysis helps you produce high quality, standard-compliant code that is guaranteed to be free from a wide range of weaknesses and vulnerabilities.

The example we will examine shows how TrustInSoft Analyzer can detect dangling pointers (also known as CWE-562:Return of Stack Variable Address or CWE-416: Use After Free in the Common Weaknesses Enumeration).

This example originally appeared in a blog post by Julien Cretin and Pascal Cuoq on TrustInSoft website. We encourage you to read the whole post for more technical details and some examples of production code carrying this kind of bug.

Analyzing the source code

Let’s have a look at the program example3.c, located in the /home/tis/1.42/C_Examples/TutorialExamples directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* TrustInSoft Analyzer Tutorial - Example 3 */
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    {
        char a = 3;
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        char b = 4;
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }

    return 0;
}

This program prints out some text. The result should differ depending on the value of the expression p == q in line 24.

Start the value analysis with the following command:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.42/C_Examples/TutorialExamples/example3.c

and launch the GUI (as explained in the Getting Started section).

After selecting the Properties widget and the Only alarms button, you will notice that there is one alarm and some lines of dead code right after the alarm:

_images/tis-qs_examples_example3_1.png

The assertion generated shows that the analyzer has found a dangling pointer (a pointer not pointing to a valid object):

/*@ assert Value: dangling_pointer: ¬\dangling(&p); */

Let’s use the analyzer to follow the variable p through the execution, and try to find the source of the problem:

_images/tis-qs_examples_example3_2.png

The values of the variable p before and after the selected statement will be listed in the Values widget in the bottom panel. We can see that, after the initialization p = &a, the variable p holds the address of a as expected.

Selecting the expression p == q inside the if will show the value of p at this point in the program. Before the evaluation of the p == q expression p is shown as ESCAPINGADDR, meaning that it does not hold the address of a valid object anymore.

_images/tis-qs_examples_example3_3.png

The reason is quite simple: p holds the address of the local variable a, whose lifetime is limited to the block in which the variable is defined:

{
    char a = 3;
    p = &a;
    pv = (uintptr_t) p;
}

Note

As stated by clause 6.2.4:2 of the C11 standard,

If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

The pointer p refers to the variable a outside of its lifetime, resulting in undefined behavior. As every execution path results in an undefined behavior, the analysis stops.

The problem: undefined behavior

When the behavior of a program is undefined, it means that anything can happen at execution time. Some programmers seem to think that, at least in some cases, this is not a big issue: they are certainly wrong. A program invoking undefined behavior is a time bomb.

A dangling pointer is an example of undefined behavior. Let’s illustrate its consequences on our example.

First we compile the program with gcc:

$ gcc -Wall -Wextra -Wpedantic /home/tis/1.42/C_Examples/TutorialExamples/example3.c -o example3

and notice that, despite all of our efforts, gcc does not issue any warnings.

Running the code will display the following text:

$./example3
Roses are red,
Violets are blue,
This poem is lame,
It doesn't even rhyme.

meaning that the condition p == q evaluated to true. This can happen if the variables a and b were allocated at the same address by the compiler (which is possible since they are never in scope at the same time).

Using different compilation options should not affect the behavior of the program, but compiling with a -O2 switch and running the program results in a different output:

$ gcc -Wall -Wextra -Wpedantic -O2 /home/tis/1.42/C_Examples/TutorialExamples/example3.c -o example3_opt
$./example3_opt
Roses are red,
Violets are blue,
0x7ffc9224f27e is different from 0x7ffc9224f27f
7ffc9224f27e is not the same as 7ffc9224f27f

This time the expression p == q evaluated to false because the variables a and b were allocated to different addresses. So changing the optimization level changed the behavior of our program.

In the aforementioned post you can see evidence of a third, very weird behavior, in which p == q evaluates to false, even though a and b are allocated to the same address.

The conclusion is clear, and we will state it as a general warning:

Warning

If the behavior of your program is undefined, executing the compiled code will have unpredictable results and will very likely cause runtime errors. You should always ensure that your code is well-defined, using source code analysis or other techniques.

The solution: TrustInSoft Analyzer

We have already seen how TrustInSoft Analyzer is able to find dangling pointers and other weaknesses. We will continue the analysis and correct the code in order to guarantee that there are no problems left.

To avoid the dangling pointer problem, we will define a outside the block, so that its storage and address are guaranteed by the standard throughout the whole main function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//TrustInSoft Analyzer Tutorial - Example 3_1
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    char a = 3;
    {
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        char b = 4;
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }
}

and launch the value analysis again:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.42/C_Examples/TutorialExamples/example3_1.c

We notice that the dangling pointer alarm regarding p has been replaced by the same alarm about q. When evaluating the expression p == q the analyzer noticed a problem with the term p and stopped the analysis, so it did not get a chance to issue the alarm about q. Now that we corrected the first problem, the analyzer gets to the term q and raises the same kind of alarm.

Selecting the variables p and q in the Values widget tab and clicking on the p == q expression will show that, before evaluation, p holds the address of a whereas q has the value ESCAPINGADDR. This means that q is a dangling pointer, as it references the address of the out-of-scope variable b:

_images/tis-qs_examples_example3_4.png

To correct the problem we will simply define b outside the block, as we did before with a:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//TrustInSoft Analyzer Tutorial - Example 3_ok
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    char a = 3;
    char b = 4;
    {
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }
}

and launch the analyzer one more time to verify that there are no other alarms:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.42/C_Examples/TutorialExamples/example3_ok.c
_images/tis-qs_examples_example3_5.png

Notice that the

printf("This poem is lame,\nIt doesn\'t even rhyme.\n");

code is marked as dead. Indeed the variables a and b need to be allocated to different addresses, as they are in scope at the same time. As a consequence, the condition p == q evaluates always to false, so the first branch of the if statement will never be executed.

Congratulations! You have successfully corrected all the bugs in example3.c. The program is now guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer.

Skein Tutorial

In this example we will analyze an implementation of Skein, a cryptographic hash algorithm that was a finalist in the NIST SHA-3 competition.

We will show how to use TrustInSoft Analyzer to:

  • Get familiar with the code.
  • Search for bugs in the implementation.
  • Prove that the analyzed implementation does not contain any bugs that are in the scope of TrustInSoft Analyzer.

Tutorial Overview

Getting Familiar with Skein

In the first part of this tutorial we will use TrustInSoft Analyzer to explore the code and then launch a value analysis on the Skein implementation.

The estimated time for this lesson is less than 20 minutes.

Exploring the source code

Unlike our previous examples, the Skein implementation is composed of multiple files. All the files needed for this tutorial are located in the /home/tis/1.42/C_Examples/skein_verification directory.

Let’ start by listing all the files in the directory:

$ ls -l /home/tis/1.42/C_Examples/skein_verification
total 124
-rw-rw-r-- 1 tis tis   204 Oct  5 18:54 README
-rwxrwxr-x 1 tis tis  4984 Oct  5 18:54 SHA3api_ref.c
-rwxrwxr-x 1 tis tis  2001 Oct  5 18:54 SHA3api_ref.h
-rwxrwxr-x 1 tis tis  6141 Oct  5 18:54 brg_endian.h
-rwxrwxr-x 1 tis tis  6921 Oct  5 18:54 brg_types.h
-rw-rw-r-- 1 tis tis   524 Oct  5 18:54 main.c
-rwxrwxr-x 1 tis tis 34990 Oct  5 18:54 skein.c
-rwxrwxr-x 1 tis tis 16290 Oct  5 18:54 skein.h
-rwxrwxr-x 1 tis tis 18548 Oct  5 18:54 skein_block.c
-rwxrwxr-x 1 tis tis  7807 Oct  5 18:54 skein_debug.c
-rwxrwxr-x 1 tis tis  2646 Oct  5 18:54 skein_debug.h
-rwxrwxr-x 1 tis tis  1688 Oct  5 18:54 skein_port.h

Note

Please note that the README and the main.c files are not part of the Skein implementation. They were added for use with this tutorial.

The skein.h file is a good starting point to explore the API. We will focus our attention on the following lines of code:

typedef struct
    {
    size_t  hashBitLen;                      /* size of hash result, in bits */
    size_t  bCnt;                            /* current byte count in buffer b[] */
    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
    } Skein_Ctxt_Hdr_t;

typedef struct                               /*  256-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein_256_Ctxt_t;

typedef struct                               /*  512-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein_512_Ctxt_t;

typedef struct                               /* 1024-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein1024_Ctxt_t;

/*   Skein APIs for (incremental) "straight hashing" */
int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);

int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);

int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);

It seems that, in order to hash a message, we’ll need to:

  • declare a variable of type Skein_256_Ctxt_t;
  • initialize it using the function Skein_256_Init;
  • pass Skein_256_Update a representation of the string;
  • call Skein_256_Final with the address of a buffer in order to write the hash value.
Writing a test driver

When confronted with the analysis of an application, the usual entry point for the analysis is its main function. Nonetheless, there are many contexts in which there will not be an obvious entry-point, for example when dealing with a library. In those cases, you will need to write a driver to test the code, or better, leverage existing tests in order to exercise the code.

As the Skein implementation includes no tests, we provide the file main.c as a test driver. It implements the steps outlined above in order to hash the message "People of Earth, your attention, please":

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"

#define HASHLEN (8)
u08b_t msg[80] = "People of Earth, your attention, please";

int main(void)
{
    u08b_t hash[HASHLEN];
    int i;

    Skein_256_Ctxt_t skein_context;

    Skein_256_Init(&skein_context, HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);

    return 0;
}

The driver also prints the contents of the hashed message in the for loop at the end of the file. Each of the printed values corresponds to the numerical value of the corresponding character in the hashed message.

We compile and run the code by executing the command:

$ gcc /home/tis/1.42/C_Examples/skein_verification/*.c && ./a.out

and get the following output:

215
78
61
246
0
0
0
0

Note

You might get different output because, as we will see, there is a bug in the test file.

A first value analysis

We will start by launching a simple value analysis with the following command:

$ tis-analyzer-gui -64 -val /home/tis/1.42/C_Examples/skein_verification/*.c

Note

The -64 option specifies that we are using a 64 bit architecture. Other possible values are -16 and -32. If no value is specified, the default target architecture will be 32 bit.

Warning

For the following analysis it is extremely important to provide the correct architecture. Please make sure you are providing the correct value.

Next, we will launch the GUI by opening the link or pointing the browser to http://localhost:8080 (the port number may change if you have other analysis running at the same time).

_images/tis-qs-skein_02.png

Once you have opened the GUI, go to the Properties widget and click on the only alarms button:

_images/tis-qs-skein_05.png

In the Properties widget we can see that there are about 20 alarms.

Note

If the analyzer finds that an error occurs for every possible execution path, then it will stop the analysis. If a problem occurs in one possible execution path, the analyzer will raise an alarm and continue the analysis on the other execution paths. This is why you can get multiple alarms.

As the analyzer is not precise enough, we will start by improving the precision of the analysis in order to reduce the number alarms. We will show how to do this in the next part of this tutorial.

Please click here to continue the tutorial.

Searching for bugs

In this second part of the Skein tutorial, we will fine-tune the value analysis and investigate the alarms raised by the analyzer in order to identify and correct bugs.

The estimated time for this lesson is 20 minutes.

Increasing the precision

Note

As we have seen in previous examples, the number of different states that can be propagated at each program point is limited by the semantic unrolling level, which can be tuned with the -slevel option. The default configuration will allow only one single state, so the analysis can become imprecise quickly. Activating the -slevel option will allow to unroll loops and separately propagate the branches of conditional statements, thus improving the precision of the analysis in most cases.

A first step to improve the precision of the analysis is to activate the -slevel option. We will try to launch the analysis with an slevel value of 100 and see if there are any improvements:

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.42/C_Examples/skein_verification/*.c

Tip

100 is a good default value for the slevel because most functions terminate with less than 100 disjoint states. If the slevel is not enough, you will be able to fine-tune it later, but keep in mind that increasing the slevel can lead to slower analysis, so you will need to find a good compromise between precision and speed.

The GUI will give you information about the slevel consumed at different points in the program, making it easier to find the right value to fit your needs.

We open the GUI again


_images/tis-qs-skein_07.png

and notice that there is only one alarm left, so all the other alarms were false alarms.

Note

Unlike many other static analysis tools, TrustInSoft Analyzer never remains silent when there is a potential risk for a runtime error. As we have seen before, when the analysis is not precise enough it can produce false alarms.

On the other hand, the absence of alarms means that you have mathematical guarantees over the absence of all the flaws covered by TrustInSoft Analyzer.

Investigating the alarms

With only one alarm left, we can investigate its cause to see if it is a true positive or not.

Notice that there is some dead code that cannot be reached by any execution. In particular, this function never terminates because the return statement cannot be reached.

This means that the execution never leaves the for loop in line 19:

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);
_images/tis-qs-skein_07b.png

Clicking on the alarm in the Properties widget, we see that the alarm occurs indeed inside the loop. The annotation generated by the analyzer is:

/*@ assert Value: initialisation: \initialized(&hash[i]); */

meaning that we are trying to read a value that might not be properly initialized.

Note

The for loop in the main.c program has been transformed in a while loop by the analyzer. This is part of a normalization process that transforms the program into a semantically equivalent one in order to facilitate the analysis.

Let’s explore the values inside the hash array by going to the Values widget and clicking on the hash term in the Interactive Code Window.

_images/tis-qs-skein_08.png

By clicking on the values in the before column of the Values widget,

_images/tis-qs-skein_09.png

you will have a more readable output of the contents of the hash array:

[0] ∈ {215}
[1..7] ∈ UNINITIALIZED
This begins like: "�"

The analyzer is telling us that, before entering the loop, hash[0] contains the value 215 (which it tries to interpret as a character, which gives the This begins like: "�") and all the other items in the array are UNINITIALIZED.

Reading an uninitialized value is an undefined behavior, and that is exactly what the program is doing when trying to read past the first element of the array. The analysis stops here as all the execution paths lead to undefined behavior.

It seems that the alarm is indeed a true positive, so we cannot go further in the analysis without correcting the problem.

Finding and correcting the bug

As we have uninitialized values in the array, let’s have a look at the initialization function Skein_256_Init.

You can navigate to the definition of the function by right clicking on the function name:

_images/tis-qs-skein_10b.png

This will update the Interactive Code Window with the code of the function:

_images/tis-qs-skein_10c.png

After clicking on the hashBitLen variable, we can see in the Properties widget that its value is always 8.

This value corresponds to the value of the HASHLEN macro that is passed to Skein_256_Init function in line 15 of the original code:

    Skein_256_Init(&skein_context, HASHLEN);

Note that in the Interactive Code Window this macro has been expanded to 8.

So the length of the hash is expressed in bits, and we were inadvertently asking for an 8 bit hash. This explains why only the first element of the array was initialized.

As we wanted an 8 byte hash, we can correct the problem by multiplying the value by 8 in the call to Skein_256_Init, modifying line 15 of the main.c file to look like this:

    Skein_256_Init(&skein_context, 8 * HASHLEN);

After saving the changes we run TrustInSoft Analyzer again:

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.42/C_Examples/skein_verification/*.c

and open the GUI:


_images/tis-qs-skein_12.png

Congratulations! There are no more alarms, so our program is guaranteed to be free from all the kinds of bugs covered by TrustInSoft Analyzer.

The “real” hash value

We can see the value of the hash in the shell in which we executed TrustInSoft Analyzer:

_images/tis-qs-skein_13b.png

Let’s compile and run the program again to check that the results are the same:

$ gcc /home/tis/1.42/C_Examples/skein_verification/*.c && ./a.out

We get the following output, which is indeed the same that we get on the analyzer

224
56
146
251
183
62
26
48

Note

The results we got when compiling the program in the first part of this tutorial were different because we were compiling an ill-defined program (as it was reading uninitialized variables, which is an undefined behavior). When the behavior of a program is undefined, you can get anything as a result (in our case it was most probably garbage read from the memory, as we were accessing uninitialized values).

As the program is now guaranteed to be correct, the output should be the same on any standard-compliant installation.

The next step

Our driver tested the Skein implementation on a single message of length 80. We could modify it by testing a number of different messages, and thus gain better coverage, as normal unit tests will do. But what if we could test it on all the messages of length 80?

This is certainly impossible to achieve by the execution of any compiled test, but it is in the scope of TrustInSoft Analyzer.

In the next part of this tutorial we will show you how to generalize the test to arbitrary messages of fixed length. This will give mathematical guarantees about the behavior of the implementation on any message of the fixed length.

Please click here to continue the tutorial.

Guaranteeing the absence of bugs

The previous analysis allowed us to find a bug on the test driver. After correcting the program, the analyzer did not raise any alarms, which guaranteed the absence of a large class of bugs when hashing the 80 char message “People of Earth, your attention, please”.

In this third part of the Skein tutorial, we will generalize the result obtained in the second part to all the messages of length 80.

The estimated time for this lesson is 20 minutes.

The limits of software testing

Testing a program on a given set of test cases can allow you to identify some bugs, but it can not guarantee their absence. This is true even if you test your program on a huge amount of cases, because it is very unlikely that you will be able to cover them all.

Even in a simple example like ours (hashing a message of length 80), there are way too many test cases to consider (see the note below to get some detailed calculations).

Furthermore, even if your cases pass the test, this gives you no guarantee about future behavior: for example, if a test case results in an undefined behavior this might pass unnoticed in the test environment but trigger a bug in your client’s machine.

Note

Although the number of distinct 80 chars arrays is certainly finite, there are 28 = 256 possible values for a char variable so there are 25680 ≈ 4.6 x 10192 different arrays of 80 chars. As the number of elementary particles in the visible universe is considered to be less than 1097, we can conclude that the number of arrays of 80 chars is definitely out of scope for an exhaustive test.

The power of abstract interpretation

The value analysis performed by TrustInSoft Analyzer uses abstract interpretation techniques in order to represent the values of the terms in the program. This allows for the representation of very large or even infinite sets of values in a way that makes sense to the analyzer. That is why we are going to be able to consider all the possible values for a given variable.

If the value analysis raises no alarms, this means that you have mathematical guarantees about the correctness of the program. These guarantees are not limited to one particular execution, but extend to any execution in a standard-compliant environment (which respects all the hypothesis made during the analysis).

Note

We will use some primitives that have a meaning in the abstract state of the analyzer, but that make no sense for a compiler. For this reason, we would not be able to compile or execute the driver anymore.

One such function is tis_interval. The assignment x = tis_interval(a, b) tells the analyzer that the variable x can hold any value between a and b (both included).

Generalizing the analysis to arbitrary messages of fixed length

Our goal is to obtain a result that is valid for any given array of length 80. Let’s start by modifying the main.c program in order to tell the analyzer that each of the elements in the msg array can take any value between 0 and 255.

To accomplish this, it suffices to add a couple of lines to the main.c program

    for(i = 0; i < 80; i++)
        msg[i] = tis_interval(0, 255);

to make it look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"

#define HASHLEN (8)
u08b_t msg[80];

int main(void)
{
    u08b_t hash[HASHLEN];
    int i;

    for(i = 0; i < 80; i++)
        msg[i] = tis_interval(0, 255);

    Skein_256_Ctxt_t skein_context;

    Skein_256_Init(&skein_context, 8 * HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);

    return 0;
}

We then launch the value analysis again

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.42/C_Examples/skein_verification/*.c

and open the GUI:


_images/tis-qs-skein_13.png

There are no alarms in the GUI. This means that we have mathematical guarantees that the code is free from all the bugs covered by TrustInSoft Analyzer.

Congratulations! You have just proved that initializing, parsing, and hashing a message of length 80 by calling the three functions

    Skein_256_Init(&skein_context, 8 * HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

will cause no runtime errors, whatever the contents of the 80 char array msg.

Note

As we saw before, this kind of result would be impossible to obtain with case testing, as this would amount to write a program testing each one of the 25680 possible input messages.

A word on variation domains

We will end this part of the tutorial with a few words on the representation of values by the analyzer. At each program point, the analyzer determines a variation domain for each variable or expression. The value of the variable or expression is always guaranteed to be contained in the variation domain but, due to the possible loss of precision, these domains can be over-approximations.

Note

The analyzer can represent the variation domain of an integer variable in three different ways:

  • As an enumeration {v1; …; vn} (if the set is small enough).
  • As an interval [l..u] with lower bound l and upper bound u.
  • As an interval with periodicity information [l..u], r%m, that is the set of values between the lower bound l and the upper bound u, whose remainder in the Euclidean division by m is equal to r.

A -- represents the smallest value that fits within the type of a variable if it occurs in the lower bound and the biggest value if it occurs in the upper bound. For instance, [--..--] will mean [0..255] for a variable of type unsigned char.

If you go to the Values widget in the GUI and click on the msg variable in the Interactive Code Window (as shown below), you will be able to see how the analyzer represents integer values contained in the array msg of the program main.c:

_images/tis-qs-skein_14.png

The analyzer shows that, after the initialization loop, msg[0..79] [--..--]. This means that each of the values msg[0] to msg[79] can take an arbitrary value in the interval [--..--]. As pointed before, for an unsigned char [--..--] stands for any value that fits within the type, that is between 0 and 255.

This is exactly what was achieved by calling the function tis_interval(0, 255).

Analyzing C++ code

If the C++ front-end for TrustInSoft Analyzer is available, it can be used with the command tis-analyzer++. This command accepts the same arguments as tis-analyzer as well as specific options that can be listed using the command tis-analyzer++ -cxx-h. An interpreter version is also available using the command tis-analyzer++ --interpreter.

This tutorial will show you some C++ specific features of TrustInSoft Analyzer in order to better understand its output and help tracking the origin of alarms. It should be read after the Getting Started section. The examples used in this tutorial can be found in the /home/tis/1.42/Cxx_Examples directory.

Identifiers, constructors and calling conventions

We will start by analyzing the program matrix.cpp, which is a test over a matrix manipulation library:

int
main(void) {
    Matrix<2U, 2U> matrix_a {
        2., 1.,
        4., 2. };

    auto id = identity<2>();
    bool has_inverse = is_invertible(id);
    std::cout << "identity is inversible: " << (has_inverse ? "yes\n" : "no\n");

    Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
    Matrix<2, 1> res = solve(matrix_b,  { 6., 10. });
    std::cout << "RESULT IS:\n" << res;

    return 0;
    (void) has_inverse;
}

Start the analysis with the following command:

$ tis-analyzer++ --interpreter -gui /home/tis/1.42/Cxx_Examples/matrix.cpp

and open the GUI. The Interactive Code Window should look like:

_images/tis-cxx-matrix-main.png

The first thing to notice is that some names contain characters that are forbidden in C, like Matrix<2, 2> or std::__1::ostream, and may be prefixed by .... The names of entities in tis-analyzer++ are actually mangled, the Interactive Code Window displays an unmangled version of them to be clearer. The mangled version of names can be viewed by using the option -cxx-keep-mangling, and the mangling used is close enough to existing compiler practice to be unmangled by external tools like c++filt.

When a name is long, a shortened version of it is displayed in the Interactive Code Window with ... as prefix. Clicking on this prefix will display the full qualified name, or its mangled version if the option -cxx-keep-mangling is used.

The first statement that is not a declaration is a call to the function __tis_globinit(). This function represents the dynamic initialization phase of a C++ program [1]. It contains only calls to functions with names similar to X::Y::__tis_init_Z, that are used to initialize the non-local variables X::Y::Z. Looking at the definition of the X::Y::__tis_init_Z function will lead the Source Code Window to the body of the generated function initializing the variable X::Y::Z

The first statement of the main function in the original code is:

    Matrix<2U, 2U> matrix_a {
        2., 1.,
        4., 2. };

and corresponds in the normalized code to the line:

  Matrix<2, 2>::Ctor<double, double, double, double>(& matrix_a,2.,1.,4.,2.);

Ctor is the generic name that TrustInSoft Analyzer assigns to C++ constructors. You can see that:

  • the constructor templated arguments are made explicit.
  • all initializers of the source code are passed as arguments.
  • there is an additional argument & matrix_a.

All method calls are translated to regular C function calls, and as such they receive an additional argument which stands for the this pointer. In case of constructors, this is the address of the object being initialized.

When looking at the constructor definition, you can see that it is calling the inherited constructor Matrix_base<2, 2>::Ctor<double, double, double, double> with the same arguments, except that the this pointer is shifted to its __parent__Matrix_base<2, 2, Matrix>. The corresponding part of the original code is:

        const Matrix_base<I, J, Parent> &m1,
        const Matrix_base<J, K, Parent> &m2)
{
    auto cross_product =
        [&m1, &m2] (unsigned i, unsigned j) -> double
        {

Matrix<N, M> inherits from Matrix_base<N, M, Matrix>, and its constructor only transfers its arguments to the constructor of the parent class. In tis-analyzer++, a class A inheriting from a class B is represented by a struct A containing a field struct B __parent__B. The initialization of the base B of A is translated into a call the function B::Ctor(&A.__parent__B). This structure layout can be observed in the example by looking at the definition of the type struct Matrix<2, 2>.

The next statement of the main function in the original code is:

    auto id = identity<2>();

and corresponds in the normalized code to the line

  identity<2>(& id);

The first thing to note here is that the id variable has an auto type in the original source but is declared in the normalized code as:

  struct Matrix<2, 2> id;

tis-analyzer++ makes auto types explicit, in the same way it instantiates template parameters.

Another difference is that in the normalized code the identity<2> function takes an additional argument despite being a usual function and not a method. This is a consequence of the fact that, in C++, a non-POD [2] value returned by a function may not live inside the function but inside its caller. To model this, a function returning a non-POD type receives an additional parameter which contains the address of the initialized object.

The next statement of the main function in the original code is:

    bool has_inverse = is_invertible(id);

which, when clicked on, corresponds in the normalized code to:

  {
    struct Matrix<2, 2> __tis_arg;
    _Bool tmp;
    {
      {
        Matrix<2, 2>::Ctor(& __tis_arg,(struct Matrix<2, 2> const *)(& id));
        tmp = is_invertible<2>(& __tis_arg);
      }
      has_inverse = tmp;
    }
  }

In this case, one C++ statement is translated into a block containing multiple declarations and statements. The function is_invertible<2> takes its argument by copy, as seen in its declaration:

template<unsigned N>
bool
is_invertible(Matrix <N, N> m)

and so its parameter has to be initialized with a new object. This is the purpose of the __tis_arg_* family variables. In the current case, __tis_arg is initialized by calling the copy constructor of Matrix<2, 2> with the address of id as source. Then, the address of newly built __tis_arg variable is given to the function is_invertible<2> and the block around it delimits the lifetime of __tis_arg. This is the semantics of passing arguments by copy [3].

This transformation does not happen when calling the copy constructor of Matrix<2, 2> because its argument is a reference. References are converted to pointers, so taking a reference to an object is taking its address, and accessing a reference simply dereferences the pointer.

The next interesting statement is the one at line 37 of the original source:

    Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);

which is translated in the normalized code as:

  {
    struct Matrix<2, 2> __tis_tmp_61;
    {
      {
        operator^(& __tis_tmp_61,(double)5,
                  (struct Matrix<2, 2> const *)(& id));
      }
      ;
      ;
    }
    operator+(& matrix_b,
              (struct Matrix_base<2, 2, Matrix> const *)(& matrix_a.__parent__Matrix_base<2, 2, Matrix>),
              (struct Matrix_base<2, 2, Matrix> const *)(& __tis_tmp_61.__parent__Matrix_base<2, 2, Matrix>));
  }

Again, this statement is decomposed into a block containing multiple statements but declaring this time a variable called __tis_tmp_62. The __tis_tmp_* family of variables correspond to temporary object [4] that can be introduced by complex expressions. This temporary object is declared inside the block as its lifetime is the one of the full expression, and has to be destroyed at its end if needed.

[1]as stated in [basic.start.init].
[2]POD in the sense of [class]p10.
[3]See [expr.call]p4 and in particular: “The initialization and destruction of each parameter occurs within the context of the calling function.”
[4]as defined in [class.temporary].

User Manual

This section describes how to use TrustInSoft tools.

  • The TrustInSoft Analyzer Manual explains how to analyze an application.
  • The TrustInSoft Analyzer GUI Manual explains how to use the GUI to explore the results.
  • The tis-aggregate Manual explains how to extract information about multiple analyses (such as coverage).
  • The tis-info Manual explains how to extract many information about the analyzed source code (functions, statements, types…).
  • The tis-mk-main Manual explains how to generate a main function to be used as an entry point of the analysis when the original main function uses command line arguments.
  • The tis-mkfs Manual explains how to deal with functions that interact with the file system.
  • The tis-modifications Manual presents a tool that helps to track the source code modifications made on an application under analysis.

TrustInSoft Analyzer Manual

Introduction

The aim of this document is to help the user to study a C or C++ application with TrustInSoft Analyzer.

The main goal of this kind of study is to verify that none of the detected software weaknesses listed in the CWE-subset are present in the application. This document explains step by step how to achieve that goal.

The definitions of some of the words and expressions used in this document are given in the Glossary.

Analysis Overview

The main steps of an analysis are:

  • First, to prepare the source code in order to make it understandable by the tool. This mainly consists of finding out which options are used to compile it (see Prepare the Sources). At the end of this step, the tool should be able to run the analysis without errors (see Check Preparation).
  • Then, to define the perimeter to study (see Prepare the Analysis). This is a user choice depending on the part of the application that has to be verified. The goal of this step is to build the context in which the application will be analyzed.
  • To run value analysis in order to detect alarms related to the software weaknesses listed in the CWE-subset. Value analysis can be tuned to increase the precision, either globally or locally, which helps in reducing the number of alarms (see Study the Alarms).
  • Each remaining alarm must be examined to find out whether it is relevant or not (i.e. if it can actually happen or not). Some annotations can be added to help the analyzer in removing it if it appears to be a false alarm (see ACSL Properties).
  • Finally, it is important to check all the annotations as they are used as hypotheses at the previous step. Some of them can be formally proven using the WP tool that involves theorem prover techniques (see Prove Annotations with WP). The others have to be carefully reviewed since all the results of the analysis rely on these hypotheses.

Here and there, extracting information, either to understand the results or to produce a report, is also useful. This can be done by combining options and scripts. How to do it is also explained in Get the Information.

Although the tool can be used purely from the command line interface, it also provides a GUI (see the TIS GUI Manual) that is very convenient for exploring the computed results.

Prepare the Sources

The purpose of this chapter is to explain how to prepare the source code of an application before starting to analyze it. The main steps to perform are:

  • Finding out the preprocessing options.

    This step can either be manual (Manual Preparation) or automatic (Automatic Preparation).

    The manual preparation is the easiest way to start with if you already know the commands necessary to compile the source files. Otherwise, start instead with the automatic preparation.

  • Dealing with the external libraries.

Manual Preparation
Preprocessing
Usual pre-processing options

In the simplest cases, all the source files need the same preprocessing command. The default preprocessing command of tis-analyzer is:

clang -C -E -nostdinc -isystem TIS_KERNEL_SHARE/libc

Some more options can be added to this command with the -cpp-extra-args option. The whole command can also be specified directly with the -cpp-command option, for instance in order to use another preprocessor.

The -I and -isystem (to add include paths), -D (to add macro definitions), and -U (to remove macro definitions) options are provided as shortcuts to the -cpp-extra-args option.

For example, the following command can be used to run the analyzer on the f1.c, f2.c, and f3.c source files, taking the included files from the incl_dir directory:

$ tis-analyzer -I incl_dir f1.c f2.c f3.c

A specific preprocessing command can be given to a set of specific files with the option -cpp-command-file "f1.c:clang -C -E,f2.c:gcc -C -E" (or -cpp-command-file "f1.c:clang -C -E" -cpp-command-file "f2.c:gcc -C -E". More options can be added to a preprocessing command for a set of files in the same way with the option -cpp-extra-args-file "f1.c:-Idir/".

Any file not listed in -cpp-command-file (resp. -cpp-extra-args-file) will use the global command (resp. additional options) of the -cpp-command option (resp. -cpp-extra-args option).

If most of the source files need to have a specific preprocessing command, it is recommended to use the Automatic Preparation.

The exact pre-processing command in use can be shown by adding the command line option -kernel-msg-key pp when running the analyzer.

Advanced pre-processing options

In some applications, the source code is split in modules that require different preprocessing commands.

Warning

First of all, an important recommendation is to tackle the software in as small-sized chunks as possible. This makes the most of pre-processing problems go away.

If a particular source file needs a different preprocessing command, it is better to preprocess it first. The result file has to be named with a .i or .ci extension so the analyzer knows that it does not need to preprocess it. The difference between the two extensions is that the .i files are not preprocessed at all by the tool, whether the macro definitions are expanded in the annotations of the .ci files, which is most of the time the intended behavior. So except in some special cases, the .ci extension has to be preferred.

Source files and preprocessed files can be mixed in the command line. For instance, if the f3.c file needs some special options, f3.ci can be generated beforehand, and then used in the command line:

$ tis-analyzer -I incl_dir f1.c f2.c f3.ci

This will give the same result as the previous command, provided that f3.c has already been preprocessed into f3.ci.

Here is a synthetic example with two files h1.c and h2.c that use the same macro M which needs to have a different definition in each file.

File h1.c:
 int x = M;
 extern int y;

 int main(void) {
   return x + y;
 }
File h2.c:
 int y = M;

If M is supposed to be 1 in h1.c and 2 in h2.c the recommended command lines for this example are:

$ clang -C -E -nostdinc -DM=1 -o h1.tis.ci h1.c
$ clang -C -E -nostdinc -DM=2 -o h2.tis.ci h2.c

Then, the generated files can be provided to the analyzer:

$ tis-analyzer -val h1.tis.ci h2.tis.ci

And the obtained result shows that M has been correctly expanded:

...
[value] Values at end of function main:
    __retres ∈ {3}
...

In more complex cases, it is better to use the Automatic Preparation.

About Libraries

Most applications use some libraries, at least the standard libc. The analyzer needs to have information about the functions that are used by the application, at least the ones that are called in the part of it which is being studied.

For the libc library, some header files come with the tool and provide specifications to many of its functions. These header files are included by default when preprocessing source files. However, if the preprocessing is done before, the following option has to be employed in order to find the instrumented files:

-I$(tis-analyzer -print-share-path)/libc

The tool also provides implementations to some libc functions that are automatically loaded. They are either C source code or internal built-in functions. But the -no-tis-libc option may be used to completely ignore the tool’s library functions and header files. It can be useful when analyzing code with custom libc functions for instance.

Another intermediate solution is to use the --custom-libc <file> option. In that case, the given source file is analyzed before the tool runtime files. It gives the opportunity to overload some of the provided C implementations. The built-in functions cannot be individually be overloaded at the moment.

To overload some header files in case something is missing, the --custom-isystem <path> option can be used. Then the given include path is used before the tool ones. In that case, the custom headers xxx.h may include the tool headers with:

File <path>/xxx.h:
 #include <tis-kernel/libc/xxx.h>

 // some more declarations and/or specification for <xxx.h>

If other external functions are used, one has to provide some properties concerning each of them: at the minimum to specify which pieces of data can be modified by them. See Check the External Functions to know which functions have to be specified and Write a Specification to learn how to do it.

First Run

At this point, the source files and the preprocessing commands should have been retrieved. It is time to try the tool for the first time, for instance by running:

tis-analyzer -metrics <..source and preprocessed files..> <..preprocessing options>

The preprocessing options are only used when source files are provided. In complex cases, it can be easier to analyze only the already preprocessed files.

Automatic Preparation

This section describes how to automatically produce a compile_commands.json file that contains instructions on how to replay the compilation process independently of the build system.

Description of the format

A compilation database is a JSON file, which consist of an array of “command objects”, where each command object specifies one way a translation unit is compiled in the project.

Each command object contains the translation unit’s main file, the working directory where the compiler ran and the actual compile command.

See the online documentation for more information:

How to produce a compile_commands.json
  • CMake (since 2.8.5) supports generation of compilation databases for Unix Makefile builds with the option CMAKE_EXPORT_COMPILE_COMMANDS.

    Usage:

    cmake <options> -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <path-to-source>
    
  • For projects on Linux, there is an alternative to intercept compiler calls with a more generic tool called bear.

    Usage:

    bear <compilation_command>
    

Tip

It is recommended to use bear. It can be installed with the packet manager, typically:

sudo apt install bear
Using the compile_commands.json

In order to use the produced compilation database, run TrustInSoft Analyzer with the following command:

tis-analyzer -compilation-database path/to/compile_commands.json ...

Also, if a directory is given to the -compilation-database option, it will scan and use every compile_commands.json file located in the given directory and its sub-directories.

tis-analyzer -compilation-database path/to/project ...

It is also possible to use compilation databases in a tis.config file for the analysis.

A possible generic template for the tis.config file is given below (see Configuration files for more information about tis.config files).

{
    "compilation_database":
    [
        "path/to/compile_commands.json"
    ],
    "files":
    [
        "path/to/file_1",
        "path/to/file_2",
        "path/to/file_N"
    ],
    "machdep": "gcc_x86_64",
    "main": "main",
    "val": true,
    "slevel-function":
    {
        "function_name": 10
    }
}

To use the tis.config file, run TrustInSoft Analyzer with the following command:

tis-analyzer -tis-config-load tis.config

Note

The tis.config file uses a strict syntax for JSON. A typical mistake would be to put a comma for the last line of an object, e.g. for the line "path/to/file_N", and it would lead to an error.

Check Preparation

At this point, whatever method was chosen for the preparation step, you should, for instance, be able to execute:

tis-analyzer -metrics <... arguments...>

with the appropriate arguments, the analyzer should run with no errors. Using the command tis-analyzer-gui with the same arguments starts the GUI which lets you browse through the source code, but not see the analysis results yet, since nothing has been computed at the moment.

It is often useful to save the results of an analysis with:

tis-analyzer ... -save project.state > project.log

This command puts all the messages in the file project.log and saves the state of the project itself to the file project.state, so that it can be loaded later on. For instance, we can load it now in the GUI by executing:

tis-analyzer-gui -load project.state

In case the application includes some special features (assembler code, etc.) and/or requires to be studied for a specific hardware target and/or with specific compiler options, please refer to Dealing with Special Features.

Prepare the Analysis

This chapter explains how to specify which part of the source code of an application will be studied and in which context. Moreover, it also shows how the overall goal can be split into several separate analyses if needed. The main objective is to be able the run the value analysis, implemented by the Value plug-in, in order to obtain the alarms concerning the software weaknesses listed in the CWE-subset.

Define the Perimeter

The study perimeter could be the whole program, or only some functions of a library, or a single use case scenario. Explaining how to decide which part of the source code should be studied is very difficult, since it depends a lot on the particular application, the amount of time available, and mostly on how one looks at the problem… Adopt an incremental approach: begin with a small study, in order to understand how to apply the tools in the given situation, and then enlarge the perimeter later on.

Prepare the Entry Point

In order to run a value analysis, an entry point to the program has to be provided. The body of the entry point function defines the studied perimeter. It is usually the main function which establishes the context verified by the analysis, but other functions can be used to this end as well.

  • To analyze a library function, the entry point function should build values for the given library function’s input arguments, and then call the library function itself using the arguments prepared in this way (see Write an Entry Point).
  • To analyze a scenario, preparation of input values may be needed as well, then followed by calling some functions in a sequence. This sequence of function calls defines the scenario.
  • In some cases, the actual main function of an application can be used directly. However, if it takes options and arguments, it still has to be called from an entry point that builds values for them. The tis-mk-main utility can help in doing so (see tis-mk-main Manual). Be aware though, that if main is is a complex function that parses options and needs many string manipulations, it is probably a better idea to write a smaller entry point from scratch in order to define a more precise context of analysis.

It is important to mention here the difference between dynamic test execution and static value analysis. As the code is not executed in the latter, each of the built inputs provided to the analyzed function does not need to have a single value. It means that a function taking a single integer parameter x can, for instance, be analyzed for all the possible input values, or for all the values from a given set (e.g. 3 < x < 150). So when we mention “a value” here, we do not actually mean “a single concrete value”, but rather “a set of abstract values”.

Write an Entry Point

Basically, the entry point function has to call the functions to analyze, providing them with appropriate input values (i.e. function arguments) that correspond to the studied perimeter. Some builtin-in functions are available to build these input values:

  • for an integer interval:

    x = tis_interval(l, u);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from any value between l and u being returned. Several other functions are also provided for other types like for instance tis_double_interval(l, u) for floating-point values, and tis_unsigned_long_long_interval(l, u) for wide integers, which behave the same way for the types double and unsigned long long.

  • to initialize addr[0 .. len-1]:

    tis_make_unknown (addr, len);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from having any arbitrary len bytes in memory starting from addr.

    The tis_make_unknown function is also useful to initialize a simple variable:

    tis_make_unknown (&x, sizeof (x));
    

    This is equivalent to x = tis_interval(l,u); when l is the minimum value of the type and u is the maximum value of a type.

  • for a non-deterministic choice between two integers:

    x = tis_nondet (a, b);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from x value being a or b. These are only two cases, but these cases combine with the other possibilities resulting from the calls to the other builtin functions.

  • for a non-deterministic choice between two pointers:

    p = tis_nondet_ptr (&x, &y);
    

    This one is similar to the previous one, but for pointers.

Example: the main function below shows a valid entry point to test a compute function that takes a buffer, its length, and a pointer to store a result:

#include <stdio.h>
#include <tis_builtin.h>

int compute (char * buf, size_t len, char * result);

int main (void) {
  char buf[100];
  tis_make_unknown (buf, 100);
  size_t len = tis_interval (0, 100);
  char x;
  char * result = tis_nondet_ptr (NULL, &x);
  int r = compute (buf, len, result);
}
  • the builtin tis_init_type can be used to initialize a simple pointer, such as int * p, or a pointer to a recursive data structure, such as struct list * p. It takes five arguments:

    tis_init_type(str_type, ptr, depth, width, valid)
    
    • the first argument const char * str_type should be a string representing a valid type of the memory to initialize.
    • the second argument void * ptr should be a pointer to the memory area to initialize.
    • the third argument unsigned long depth should be an integer that exactly mirrors the behavior of the option -context-depth during the initialization.
    • the fourth argument unsigned long width should be an integer that exactly mirrors the behavior of the option -context-width during the initialization.
    • the last argument should be either 0 or 1 that exactly mirrors the behavior of the option -context-valid-pointers during the initialization.

Example:

#include<tis_builtin.h>
struct list {
    int data;
    struct list * next;
};
int main(){
    int *p0, *p1, *p2;
    struct list * p3;
    tis_init_type("int *", &p0, 1, 1, 1);
    tis_init_type("int *", &p1, 1, 10, 1);
    tis_init_type("int *", &p2, 1, 1, 0);
    tis_init_type("struct list *", &p3, 3, 1, 1);
    tis_dump_each();
}

The code above calls tis_init_type to initialize pointers p0, p1, p2 and p3. More specifically:

  • The call tis_init_type("int *", &p0, 1, 1, 1) allocates an array of size 1 given by the width argument, initialize the array element to any possible integer: S_p0[0] [--..--], and then assign the array address to pointer p0: p0 {{ &S_p0[0] }}.
  • The call tis_init_type("int *", &p1, 1, 10, 1) allocates an array of size 10: S_p1[0..9] [--..--] and assigns its address to pointer p1: p1 {{ &S_p1[0] }}.
  • The call tis_init_type("int *", &p2, 1, 1, 0) sets the last argument to 0, which allows p2 possibly be a NULL pointer: p2 {{ NULL ; &S_p2[0] }}.
  • The call tis_init_type("struct list *", &p3, 3, 1, 1) allocates a list of length 4 (the index of the last list element corresponds to the depth argument), and assign the list head address to pointer p3.
p0 ∈ {{ &S_p0[0] }}
S_p0[0] ∈ [--..--]

p1 ∈ {{ &S_p1[0] }}
S_p1[0..9] ∈ [--..--]

p2 ∈ {{ NULL ; &S_p2[0] }}
S_p2[0] ∈ [--..--]

p3 ∈ {{ &S_p3[0] }}
S_p3[0].data ∈ [--..--]
    [0].next ∈ {{ &S_next_0_S_p3[0] }}
S_next_0_S_p3[0].data ∈ [--..--]
             [0].next ∈ {{ &S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
                      [0].next ∈ {{ &S_next_0_S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
                               [0].next ∈ {0}

In order to obtain more details about the available functions which allow building imprecise values, refer to the Abstract values section, or browse the file:

more $(tis-analyzer -print-share-path)/tis_builtin.h
Generate an Entry Point

Some tools are also available and may help to build the entry point for specific situations (see tis-mk-main Manual).

Check the External Functions

Now, when the main entry point is ready, it is time to run the value analysis for the first time using the -val option.

An important thing to check is the nature of external functions. More precisely, to look for this message in the log file:

$ grep Neither proj.log
[kernel] warning: Neither code nor specification for function ...

This message indicates that the given function is undefined. In order to progress with the value analysis, it MUST be defined by either:

  • adding missing source files,
  • or writing C stubs,
  • or specifying it with ACSL properties.

The libc library functions should not appear in these messages since most of them are already specified in provided library files (see About Libraries).

Writing C Stubs

Writing C stubs for functions for which no code is available is the recommended way to go. The standard functions and the builtins presented above (see Write an Entry Point) may be used to abstract the implementation details.

To illustrate how to write stubs using standard functions and analyzer builtins, say that the code we want to analyse to find errors in it is the function main below, and we do not have the code for the function mystrdup.

char *mystrdup(char *s);

int main(void) {
  char c, *p;
  int x;
  p = mystrdup("abc");
  if (p)
    c = p[0];
  x = c - '0';
}

There is currently no good way to write a specification that indicates that mystrdup allocates a new block and makes it contain a 0-terminated string, but instead, the recommended method is to abstract it with a stub that may look as follows:

#include <string.h>
#include <stdlib.h>
#include <tis_builtin.h>

char *mystrdup(char *s) {
  size_t l = strlen(s);
  char *p = malloc(l+1);
  if (p) {
    tis_make_unknown(p, l);
    p[l] = 0;
  }
  return p;
}

The files can be analyzed with:

$ tis-analyzer -val -slevel 10 main.c mystrdup.c

As shown in the trace, the analyzer correctly detects that the main function may use c uninitialized:

tests/val_examples/stub_main.c:13:[kernel] warning: accessing uninitialized left-value: assert \initialized(&c);
tests/val_examples/stub_main.c:13:[kernel] warning: completely indeterminate value in c.
Specifying External Functions

When specifying an external function with the ACSL properties, only the assigns properties are mandatory: they give to the tool an over-approximation of what can be modified. However, providing also the function’s post-conditions can help the analyzer and yield more precise results (see Write a Specification).

Tune the Precision

Performing value analysis with no additional options (like in all the cases above) makes it run with a rather low precision. It should not take too long to get the results that indicate where the alarms were found. When using the GUI, the list of alarms can be selected using the Kind filter of the Properties panel, and a summary of the number of alarms can be found in the Dashboard panel.

The global precision can be changed using the -slevel n option. The greater n is, the more precise the analysis is (see About the Value Analysis for more details). These alarms which could be formally verified by increasing the precision in this way will disappear. Those which remain are the difficult part: they require further attention.

Value analysis takes longer and longer when the precision increases. Thus it can be profitable to fine tune the precision locally on certain functions in order to benefit from the higher precision level where it is advantageous (so that more alarms are formally verified) while keeping it lower where it matters less (so that the analysis runs faster).

For the same reason (fast analysis to find bugs earlier) it can also be useful to reduce (temporarily) the size of the arrays (when the source code is structured to allow this easily).

Using the command line option -tis one can examine the evolution of the analysis. More exactly, this option provides an interactive view of what the analyzer is doing and, for readability reasons, it should be always used in a large terminal. The insight into the progress of the analysis gained in this way helps greatly in choosing appropriate global and local precision levels:

  • If it appears that huge amount of time is spent in a low level function f, its precision can be decreased using the -slevel-function f:n option with n smaller than the global precision.
  • In the opposite case, if the global precision has to be kept low for performance issues, the local precision can be increased in some functions only.

The same information is shown when starting the value analysis from the GUI and the final analysis information can be found in the Dashboard panel.

Note that at this point the goal is not to study the alarms precisely, but rather to get a rough idea of the amount of work needed in order to be able to decide which part to study.

To Split or Not to Split

The experience suggests that if the size of the analyzed source code is large and / or if there are many alarms, it is usually worthwhile to split the study into smaller, more manageable sub-components. The idea here is to write a precise specification for every sub-component and then analyze each of them independently toward its particular specification. Afterwards, the main component can be studied using those specifications instead of using directly the sub-components’ source code.

It is quite easy to decide which part should be split if some main features are identifiable and clearly match a given function. Otherwise, a first overview of the number of alarms may help to isolate a part that seems difficult for the analyzer. However, as the separated function must be specified, it is much easier if it has a small and clear interface (in order study a function, it must be called in the intended context, and this context might be difficult to build if it corresponds to a large and complex data structure).

To split the analysis one must write:

  • a main function for the main component,
  • a main function and an ACSL specification for each of the API functions which is supposed to be studied independently.

Then, when performing the analysis for the main component, the -val-use-spec option should be used in order to provide the list of the API specified functions. For each of the functions from this list the value analysis will use the function’s ACSL specifications instead of the function’s body.

For instance, the commands below can be used to split the study into the main analysis with two sub-components corresponding to the f1 and f2 functions:

$ tis-analyzer -val $SRC main.c -val-use-spec f1,f2 \
                                -acsl-import f1.acsl,f2.acsl \
                                -save project.state
$ tis-analyzer -val $SRC main_f1.c -acsl-import f1.acsl -save project_f1.state
$ tis-analyzer -val $SRC main_f2.c -acsl-import f2.acsl -save project_f2.state

In the commands above:

  • the files main.c, main_f1.c and main_f2.c should hold the entry points for the main component, and the f1 and f2 functions respectively;
  • the files f1.acsl and f2.acsl should hold the ACSL specifications of the, respectively, f1 and f2 functions (see Write a Specification to learn how to write a specification).
Multi-Analysis

There is another case where studying an entry point may require several separate analyses: when there is a parameter in the program that has to be attributed a value (e.g. using a macro) and when it is difficult to give it an arbitrary value beforehand (e.g. it is a parameter defining the size of an array). In such situations it is better to write a loop in an external script to attribute different values to that parameter and run as many analyses as necessary.

The following script runs an analysis for every N being a multiple of 8 from 16 to 128:

#/bin/bash

for N in $(seq 16 8 128) ; do
  tis-analyzer -D N=$N -val $SRC main.c
done

Of course, it supposes that N is used somewhere in main.c.

Write a Specification

Writing a specification for a function is useful in two cases:

  • When some splitting is done.

    A certain function can be studied independently, as a separate sub-component of the analysis. The function is verified toward a certain specification and then that specification can be used (instead of using directly the function’s body when analyzing function calls) in the main component’s verification (To Split or Not to Split).

  • When using some unspecified external functions.

    If an external function, that is not part of the subset of the libc library functions provided with the tool (which are already specified), is used in the program, then it needs an explicit specification. The provided specification has to indicate at least which of the concerned data may be possibly modified by the function. Pre and postconditions are not mandatory in that case.

The specification is written in ACSL and is mainly composed of:

  • preconditions (requires properties),
  • modified data descriptions (left part of assigns properties),
  • dependencies (\from part of assigns properties),
  • postconditions (ensures properties).

The ACSL properties can be either inlined directly in the source files or written in separate files and loaded (as explained in ACSL Properties). An analysis will use the specification instead of the function’s body to process a function call when either the body is not provided or an appropriate -val-use-spec option has been set in the command line.

When analyzing a function call using the specification, the tool:

  • first verifies that the pre-state satisfies the preconditions,
  • then assumes that:
    • the post-state only differs from the pre-state for the specified modified data,
    • and that the post-state satisfies the postconditions.
Pre-conditions
Pre-conditions of an External Function

In the specification of an external function the preconditions are not mandatory. If some are provided though, the analyzer checks whether they are satisfied at each call. Therefore adding preconditions in that case makes the verification stronger.

Pre-conditions of a Defined Function

When a defined function is analyzed separately from the main application, its preconditions define the context in which it is studied. This does not have to be the most general context imaginable, but it has to include at least all the possible usage contexts that can be found in the application.

For example, suppose that the function f has been selected to be studied independently, and that it takes a single parameter x. If f is always called from the application with positive values of x, it is possible to study it only in that context. This property must be then specified explicitly by the precondition:

requires r_x_pos: x > 0;

Also, the corresponding main_f function - i.e. the one that is written and used as an entry point in order to study f individually - must call f with all the positive values for x. If the preconditions specify a context smaller than the context defined implicitly by the main function, it will be detected by the analysis since some of the preconditions will be then invalid. But the opposite case (i.e. if the specified context is larger than the studied input context) would not be detected automatically.

In other words, in the example above, if main_f calls f with (x >= 0), it will be detected since (x == 0) does not satisfy the precondition. However, if it calls f only with (0 < x < 10), the precondition will be formally satisfied, but the function behavior for (x >= 10) will not be studied. If, for instance, f is then called in the application with (x == 20), the problem will not be detected since the precondition is valid for this value.

Warning

When verifying a function toward a specification that is then used to verify another component, it is very important to make sure that the context defined by the specified preconditions and the studied input context represent the same thing.

Note that:

  • in most cases each of the function’s parameters should be specified in at least one precondition;
  • if a parameter can hold any value, but has to be be initialized, the \initialized precondition should be included in the specification;
  • in case of pointers, it is important to specify whether they have to be \valid (meaning that they point to an allocated memory zone).
Modified Data and Data Dependencies

The assigns properties are composed of two parts, which specify the modified data and its dependencies:

//@ assigns <left part> \from <right part>;
Modified Data

Each assigns property specifies the modified data on the left side of the \from keyword.

The union of the left parts of all the assigns properties in a given function’s specification is an over-approximation of the data modified by this function. Hence the data that is not in this set (i.e. the set defined by the union of their left parts) is expected to have the same value in the pre-state and the post-state.

The information about the modified data is used:

  • by the WP plug-in, for any function call;
  • by the Value plug-in, when the specification of a called function is used instead of its body.
Data Dependencies

Each assigns property specifies the data dependencies on the right side of the \from keyword.

The output value of the modified data is expected to depend only on the value of its data dependencies. In other words, if the value of the dependencies is equal in two input states, then the value of the modified data should be equal in the two output states.

There are two kinds of dependencies:

  • the direct dependencies are those which are used to compute the value;
  • the indirect dependencies are those which are used to choose between the executed branches or the address of the modified data (through pointers, indexes, etc.).

The indirect dependencies have to be explicitly marked with an indirect: label. All the other dependencies are considered as direct.

Here are some examples of correctly defined dependencies:

//@ assigns \result \from a, b, indirect:c;
int f (int a, int b, int c) { return c ? a : b; }

int t[10];
//@ requires 0 <= i < 10; assigns t[..] \from t[..], a, indirect:i;
void set_array_element (int i, int a) { t[i] = a; }

The dependency information is:

  • Not used by the WP plug-in.

  • Very important for many analysis techniques that require knowledge about the data dependencies (such as Show Defs feature in GUI, slicing, etc.), but only when the function body is not used, since if the body is available the dependencies can be computed by the From plug-in.

  • Employed in the value analysis of the pointers: the output value of the modified pointers can only be among the specified direct dependencies.

    Note that an intermediate pointer is needed when a pointer is assigned to the address of a variable. This property is not valid:

    assigns p \from &x; // NOT valid.
    

    One must rather declare T * const px = &x; in the code and then write the correct property:

    assigns p \from px;
    

    It means exactly that the output value of p may be based on &x and on no other existing variables.

Remember that the assigns properties specify an over-approximation of the modified data. For instance, the following properties only say that nothing except B is modified by the function:

assigns B \from \nothing;
assigns \result \from B;

In order to specify that B is surely initialized after the function call, one has to add a post-condition:

ensures \initialized(&B);

When the function result is used to return an error status, it is often the case that the post-condition rather looks like:

ensures (\result == 0 && \initialized(&B)) || (\result < 0);
Post-conditions

It is not mandatory to specify ensures properties, neither in the case of splitting a defined function nor for specifying an external function. However, some information about the values in the returned state might be needed in the analysis of the caller function.

Post-conditions of an External Function

In the specification of an external function, the provided post-conditions cannot be checked since the source code is not available. Hence they are used as hypotheses by the analysis and cannot be formally verified themselves.

Warning

As the post-conditions of external functions cannot be verified by the tool, they must be checked with extra care!

Post-conditions of a Defined Function

If ensures properties are specified, it is usually good to keep them as simple as possible. They have to be verified during the function’s body analysis and over-specification only increases the amount of work necessary to achieve that.

Check the Coverage

Before going any further, it is often advantageous to check the code coverage in order to verify if all the dead code which exists in the application (i.e. the parts of code composed of the functions and branches that are not reachable by the analysis) is indeed intended. To learn how to do that, see Information about the Coverage.

Dead code can be spotted in the GUI by looking for statements with the red background. If some dead code seems strange, it can be explored and investigated using the value analysis results. Clicking on variables and expressions allows to inspect their computed values.

As long as the analysis did not degenerate, the code classified as dead by the tool is a conservative approximation of the actual dead code. It is guaranteed that, in the context defined by the input values, the concerned statements cannot be reached whatever happens. When relying on this guarantee, one should however keep in mind these two important assumptions it depends on: that it applies only if the analysis did not degenerate and that dead code is always considered in the context defined by the input values. This is because most of the time if some code has been marked as dead when it should not have been, the reason is actually that the context of analysis was defined too restrictively (i.e. it does not include all the input values that can happen in the real execution). Another common reason is that the analysis has simply stopped computing a given branch:

  • either because the information was too imprecise to go further (in that case a degeneration warning message has been emitted),
  • or because a fatal alarm has led to an invalid state (e.g. accessing the content of a pointer which is NULL for sure renders the state invalid).
Summary

At this point, one should have some analyses (one or several) which cover the intended parts of the code and end without any degeneration. The results most likely include alarms. The chapter Study the Alarms explains how to deal with the alarms and, before that, the chapter Get the Information explains how to extract more information from the analysis results.

In case if, due to applying some splitting, there are several analyses, there is no preferred order of the verifications. In any case however, modifying the existing specifications leads to invalidating the results obtained so far.

Get the Information

This chapter explains how to extract some information from the analyzed project using the tis-info plug-in and other external scripts. The tis-info plug-in provides options to generate textual files containing information about functions, variables, properties and statements. Filters can be used to extract specific information from the these files.

Some pieces of information are purely syntactic while some others are of semantic nature. The semantic information is only available if the project which was used to generate the files holds the value analysis results.

For an exact and up-to-date description of each generated piece of information, please refer to the tis-info Manual.

Generate the CSV Files

The tis-info plug-in can be used to generate CSV files. The main options allow us to extract the information concerning:

  • functions: -info-csv-functions functions.csv
  • variables: -info-csv-variables variables.csv
  • properties: -info-csv-properties properties.csv
  • statements: -info-csv-statements statements.csv

For instance, in order to get the information about functions from a previously saved project project.state, the command line would be:

tis-analyzer -load project.state -info-csv-functions functions.csv

As mentioned before, the kind of obtained information (i.e. either purely syntactic or also semantic) will depend on whether the saved project includes the value analysis results or not.

In the generated CSV files, the information about each element is printed on a single line (with comma separated fields). Hence, the files can be opened in a spreadsheet tool for easy selection of elements. Moreover, this format can be easily grepped (i.e. filtered using the grep utility), for instance, the following command returns all the information about the function funname:

grep funname functions.csv

In order to filter on a specified column, the awk tool is also very practical. For instance, the following command returns only the lines where the word valid appears in the fifth column:

awk -F, '! ($5~"valid") {print}' properties.cvs

Also, awk can be used to easily extract only some of the columns:

awk -F, '{print $4 $5}' properties.cvs
Information about the Functions

The generated file functions.csv provides information about the functions. It contains the list of both defined and declared functions appearing in the analyzed source code, including their locations, whether they are called or not, are they reachable in the analyzed context, etc. The most useful piece of information here concerns the coverage and it is detailed just below.

Information about the Coverage
Coverage from a Single Analysis

The coverage of each function can be found in the appropriate column of the functions.csv file. Note, that this information is semantic of nature and thus only available if the value analysis results have been computed.

At this point, the internal functions are usually not interesting and they can be filtered out with:

grep -v TIS_KERNEL_SHARE

The easiest approach then might be to check first the completely unreachable functions:

grep ", unreachable,"

And the completely covered ones:

grep -v ", unreachable," | grep -v "100.0%"

Then the GUI can be used to explore the dead code of the functions that are not totally covered in order to verify if this is intended or not.

Coverage from Several Analyses

If the information about the code coverage comes from several separate analyses, the generated functions.csv file is not sufficient anymore to measure the real coverage of functions, since it represents only the results extracted from only one project out of many. Because of this issue, the tis-aggregate tool provides a coverage command to extract all the relevant information from the functions.csv files and compile it into overall coverage results that can be presented in the CSV format:

tis-aggregate coverage project.aggreg  >  coverage.csv

Here, project.aggreg is a file that gives the base name of the analyses to consider. For instance:

path_1/proj_1
path_2/proj_2
...
path_n/proj_n

The tool then process information from the path_i/proj_i_functions.csv files.

This tool also provides some more options, such as presenting the results in HTML format (see the Tis-aggregate Manual).

Information about the Properties

The location and status of each property can be found in the properties.csv file. If the names given to the user annotations follow some naming conventions (see Naming the Annotations), it is quite easy to use grep to extract more precise information that file.

For instance, if the names of the annotations that should be proved by the WP plug-in all have a _wp suffix, it is easy to check if they are all verified with the following command:

grep "_wp," properties.csv | grep -v ", valid$"
Information about the Statements

The generated file statements.csv provides information about certain kinds of statements in the analyzed program.

For instance, it contains information about the function calls, in particular whether a specific call is direct or not. Moreover, if an indirect call has been encountered during the value analysis, it provides the list of all the possibly called functions. Extracting this information can be done with:

grep ", call," statements.csv | grep -v DIRECT

Some useful information concerning the condition values can also be found here. Especially, whether a condition happens to be always true or always false. This kind of situations is often also observable through the dead code, although not in all cases, since an if condition might be always true, but may have no else branch (which, obviously, would be dead if it existed).

Information about the Variables

The information about all the variables is available in the variables.csv generated file. The exception are the global variables which are not accessed or modified, since they are removed from the analysis results. This information can also be used, for instance, to easily find the location of the definition of a variable or to list all the static or volatile variables.

Study the Alarms

Understand and Sort the Alarms

The list of all the existing alarms is given in Value Analysis Alarms.

Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.

False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.

The list of the generated assertions can easily be extracted from the properties.csv file (see Information about the Properties). Then, for instance, these assertions can be counted in order to track the evolution of their total number during the working process. (Note, however, that this particular measure is not necessarily very pertinent, because the relation between problems and emitted alarms is not really one-to-one. Losing precision at one point of the analysis can lead to several alarms which have the same origin. Moreover, solving one problem may cause many unrelated new alarms, as several problems might have been hiding behind the solved one.)

The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.

About the Value Analysis

Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.

Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.

See Tune the Precision for explanations concerning tuning the precision level with the -slevel option. The precision level is related with the number of states that can be stored for each program point. The smaller this number is, the more coarse the approximation is, as more computed states are merged together.

Example:

  //@ assert 0 < x < 10;
  if (x < 5)
     y = 5;
  else
     y = 10;
L:

Computing a unique state at label L only gives that x [1..9] and y [5..10]. But if the slevel is larger, then two states can be stored at L giving exactly that either y == 5 when x [1..4] or y == 10 when x [5..9].

Notice that the assertion assert 0 < x < 10 above reduces the possible values for x. It is important to remember that this works in the same way for the assertions automatically generated from alarms. For instance, if a statement a = b / c; is reached with c [0..100], an alarm is emitted in form of an appropriate assertion:

/*@ assert Value: division_by_zero: c ≢ 0; */

Then, the analysis continues with context where c [1..100].

Propagate values separately

Beside conditions in the code that automatically split the states as above (when there is enough slevel), some builtins are available to generate more than one state.

The builtin tis_variable_split (a, sz, n); splits the state on the data which address is a and the size sz if it holds less than n values. For instance, if f returns x [1..5] in the following code, the call to tis_variable_split generates five states, one for each value of x:

int x = f();
tis_variable_split (&x, sizeof(x), 10);

Moreover, the builtin tis_interval_split(l, u) does the same thing that tis_interval(l, u) does, but it automatically causes the individual values between l and u inclusive to be propagated separately. The slevel option must then be set high enough to keep the precision of the analysis.

In the following example, since all values of n are propagated separately, the analyzer is able to guarantee that the x_pos assertion holds.

#include <tis_builtin.h>

int G;

void foo(int n)
{
    int x = G;
    /*@ assert x_pos: 1 <= x <= 10; */
}

int main(void)
{
    int n = tis_interval_split(-10, 10);
    G = n;
    if (n > 0)
        foo(n + 5);
    return 0;
}
Remove Alarms by Tuning the Analysis

Tuning the -slevel and -slevel-function options to control the precision of the analysis, using the information obtained from the -tis option’s output, has already been explained in Tune the Precision.

In the GUI, statements where the precision level (i.e. the slevel value) has been reached are indicated by a colored margin in source view windows (see Interactive Code Panel). Moreover, the reached value can be sorted in the statements window (Statements).

The analysis can also get imprecise for other reasons than reaching the defined precision level. Especially this is the case when the log trace include messages about garbled mix values. It is very likely that if such messages appear, the analysis will not produce any interesting results.

Tip

The analysis can be stopped automatically from the command line option -val-stop-at-nth-garbled.

This option can also be set from the GUI.

The analysis can also be stopped when it reaches the maximum memory consumption set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM. If TIS_KERNEL_MAX_MEM is set, the analyzer becomes more conservative in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2 GiB of memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value for this variable is 62.

There are several other options that can be used to fine tune the value analysis.

Some of them work similarly to the -slevel option, i.e. they control the maximum number of states that can be kept separated. For instance:

  • -val-split-return-function f:n: split the return states of function f according to \result == n and \result != n;

  • -val-split-return auto: automatically split the states at the end of each function according to the function return code;

  • -val-split-return-function f:full: keeps all the computed states at the end of function f in the callers;

  • -val-slevel-merge-after-loop <f | @all> when set, the different execution paths that originate from the body of a loop are merged before entering the next execution.

    The default behavior is set to all functions (-val-slevel-merge-after-loop=@all). It can be removed for some functions (-val-slevel-merge-after-loop=@all,-f), and deactivated for all functions (-val-slevel-merge-after-loop=-@all), and activated for some (-val-slevel-merge-after-loop=-@all,+f).

  • -wlevel n: do n loop iterations before widening.

Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:

  • -val-ilevel n: Sets the precision level for integer representation to n: each integer value is represented by a set of enumerated values up to n elements and above this number intervals (with congruence information) are used.
  • -plevel n: Sets the precision level for array accesses to n: array accesses are precise as long as the interval for the index contains less than n values. See Tuning the precision for array accesses for more information about this option.

There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).

Of course not all existing options have been enumerated here. The full list of the available options is given by -value-help.

Tuning the precision for array accesses

As explained in Tune the Precision, temporarily reducing the size of the arrays may be a first step during the interactive phase to make the analysis time shorter. But when analyzing large arrays, the -plevel option can be used to increase the precision level for array accesses. This option sets how hard the analyzer tries to be precise for memory accesses that, considering the imprecision on the indexes involved, can be at any of many offsets of a memory block. The default is 200 and may not be sufficient for an access at unknown indices inside a large or nested array to produce a precise result. This is illustrated by the example below:

#include <tis_builtin.h>

char c[20];
struct s { int m[50]; void *p; };
struct s t[60];

void init(void) {
  for (int i = 0; i < 60; i++)
    {
      for (int j = 0; j < 50; j++)
        t[i].m[j] = (i + j) % 10;
      t[i].p = c + i % 20;
    }
}

int main(void) {
  init();
  int x = tis_interval(5, 45);
  int y = tis_interval(2, 56);
  t[y].m[x] = -1;

  x = tis_interval(5, 45);
  y = tis_interval(2, 56);
  int result = t[y].m[x];

  int *p = &t[y].m[x];
  int result2 = *p;
}

With the default value of -plevel (200):

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

With higher plevel:

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ [-1..9]
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

Note that result2 is not precise even with the higher plevel. Handling the lvalue t[y].m[x] directly allows the analyzer to be optimal as long as the value of the plevel option allows it to, but forcing the analyzer to represent the address as the value of the variable p produces this set of offsets:

{{ &t + [428..11604],0%4 }}

This set of offsets contains pretty much the addresses of everything in t, including the p pointer members, so it appears when dereferencing p that the result can be an address.

Trading memory for analysis speed

As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.

The environment variable TIS_KERNEL_MEMORY_FOOTPRINT can be used to set the size of caches used during the value analysis, speeding up some slow analyses in which the caches were getting thrashed. The default is 2 and each incrementation doubles the size of caches. Only use this variable if you are not already low on memory.

Another useful option which helps in reducing the computation time is -memexec-all. If this option is set, when analyzing a function, the tool tries to reuse results from the analysis of previous calls when possible.

Control the Analyzer
Enhance the Trace

It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.

First of all, the standard printf function can be used to output constant messages or messages involving only precise values. However, printing the computed values when they are imprecise is not possible using printf, and instead the tis_show_each function should be used. The name of this function can be extended with any string, so that is is easier to make the difference between different calls (as the full function name is printed each time it is called). For instance:

tis_show_each_iteration (i, t[i]);

The statement above will output messages like this (one for each analyzed call):

[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])

Another useful function, which properly handles imprecise values, is tis_print_subexps. When given any number of expressions as arguments, it will print the values of all the sub-expressions of each provided expression. The first argument of this function must be always a string literal, which will be printed out in order to help distinguish between different calls. For instance:

tis_print_subexps ("simple sums", x + y, y + z, z + x);

Such a statement will output messages like this:

[value] Values of all the sub-expressions of simple sums (expr 1):
    int x + y ∈ {3}
    int x ∈ {1}
    int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
    int y + z ∈ {5}
    int y ∈ {2}
    int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
    int z + x ∈ {4}
    int x ∈ {1}
    int z ∈ {3}

Moreover, tis_print_abstract_each allows the contents of structured variables to be observed. For instance:

tis_print_abstract_each(&i, &t);

This statement will output messages like this (one for each analyzed call):

[value] Called tis_print_abstract_each:
     i ∈ {0; 1; 2; 3; 4}
     t[0..4] ∈ [0..10]
      [5..99] ∈ UNINITIALIZED

Note that, tis_print_abstract_each in general takes addresses of variables as parameters. This applies as well for an array, such as t in the example above. Contrary to popular belief, when t is an array, &t is not the same as t, and the user should call tis_print_abstract_each(&t); to see the whole array (pointer decay only shows the first element).

To get even more information, the tis_dump_each function can be used to print the whole state at the program point where it is called. But it may be easier to call the tis_dump_each_file function to print the state in a file. The name of the file is computed from the first argument of the call (which must be a string literal), an incremented number, and an optional directory given by the -val-dump-directory option. The -val-dump-destination option allows to choose which kind of output is expected among txt or json (all for both, none for no output). For instance, when calling tis_dump_each_file ("state", *(p+i)) in a test.c file, and analyzing it with the command:

$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp

these messages are shown in the trace:

test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'

The two generated files hold both the whole state computed the first time the program point is reached and the possible values for *(p+i).

For instance, the JSON file may look like:

{
  "file": "test.c",
  "line": 11,
  "args": "([0..10])",
  "state": [
    {
      "base": "t",
      "values": [
        { "offset": "[0..4]", "value": "[0..10]" },
        { "offset": "[5..9]", "value": "UNINITIALIZED" }
      ]
    },
    { "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
  ]
}

To better understand the results, see the Value Analysis Data Representation section.

Stop the Analyzer when Alarms Occur

In order to avoid wasting time analyzing the application in a wrong context, the analysis can be stopped as soon as some alarms are generated thanks to the -val-stop-at-nth-alarm option. With the argument equal to 1, it aborts the analysis at the first alarm. To ignore a certain number of alarms, the argument can be increased. Although there is no strict relation between the given argument and the number of alarms generated before the analysis stops (i.e. these two values are not necessarily equal), one thing is guaranteed: providing a larger number will lead to skipping more alarms.

Stop the Analyzer When Taking Too Long

The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:

$ kill -USR1 12345

The analyzer can also be stopped through the GUI (see Disconnect/Kill server).

When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.

The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:

$ tis-analyzer --timeout 5m -val ... -save project.state
Watchpoints

Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:

  • tis_watch_cardinal: stop the analysis when the number of different values that a given memory location may possibly contain (because of imprecision) is greater than a certain maximal amount.
  • tis_watch_value: stop the analysis when a given memory location may possibly contain a value of the provided set of forbidden values.
  • tis_watch_address: stop the analysis when a given memory location may possibly contain an address.
  • tis_watch_garbled: stop the analysis when a given memory location may possibly contain a garbled mix value.
  • tis_detect_imprecise_pointer: stop the analysis when any expression is evaluated to an imprecise pointer which contains a given base address.

These functions are available using #include <tis_builtin.h>. The arguments of the four tis_watch_* functions follow the same logic:

  • The two first arguments define the memory location to watch: its address p and its size s.
  • The last argument n is the number of statements during which the condition may remain true before the analysis is stopped:
    • if n == -1, the analysis never stops, but messages are printed each time the condition is reached;
    • if n == 0, the analysis stops as soon as the condition is reached;
    • if n > 0, the analysis continues for the n-th first occurrences where the condition is reached (and prints messages for each of them) and stops at the next occurrence.
  • The meaning of the (potential) arguments in between varies depending on the particular function.

The function tis_detect_imprecise_pointer only takes a pointer as argument.

Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:

 int x = 0; /* the memory location to watch */
 void *p = (void *)&x; /* its address */
 size_t s = sizeof(x); /* its size */
 int y[10];

 /* The analysis stops when x is not exact (i.e. not a singleton value). */
 int maximal_cardinal_allowed = 1;
 tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);

 /* The analysis stops the fourth time when x may be negative. */
 int forbidden_values = tis_interval(INT_MIN, -1);
 int exceptions = 3;
 tis_watch_value(p, s, forbidden_values, exceptions);

 /* The analysis stops when x may be an address. */
 tis_watch_address(p, s, 0);

/* The analysis stops when x may be a garbled mix value. */
 tis_watch_garbled(p, s, 0);

 p = y;

 /* The analysis starts to detect if an expression is evaluated to an
 imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);

/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;
Remove Alarms by Adding Annotations

Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.

If you do not know how to add ACSL properties to your project, first read ACSL Properties.

Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.

Warning

Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.

Some examples are given below.

If source code modifications are needed, and the source code is in a git repository, the tis-modifications tool may be helpful to track them in order to check that they are correctly guarded. See tis-modifications Manual.

Assertion to Provide Missing Information

As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.

Example:

     int T[10];
     ...
L1:  if (0 <= x && x < y) {
       ...
L2:    if (y == 10) {
L3:      ... T[x] ...
         }
       ...
       }

When the T[x] expression is encountered at the label L3, the analyzer tries to check if the T array is accessed correctly (i.e. inside the array bounds), namely, is the (0 <= x < 10) condition true. It already knows that the (0 <= x) part holds, due to the conditional statement at label L1 (assuming that x has not been modified since then). Whatever values x might have had before, at L1 they have been restrained to only non-negative ones and this fact has been stored in the internal state (the interval of values for x was modified) and thus is visible at L3. For example, if before L1 the value of x was [--..--] (i.e. nothing is known about x except that it is initialized), then after L1 it would be [0..--] (i.e. the interval spanning from zero to positive infinity).

Now, the analyzer still needs to verify if (x < 10) also holds. This is obvious for anybody who reads the code: the condition at L1 assures that (x < y) and the condition at L2 assures that (y == 10), therefore (x < 10) must be true. Unfortunately, because of the limitations of the adopted value representation method, the analyzer cannot deduce this by itself. The fact that (x < y) holds just cannot be expressed in the internal state (nor actually any abstract relation between variables). And, supposing that the value of y at L1 is [--..--], the (x < y) condition does not help to restrain the values of x, its upper bound remains thus as before. Hence, this important piece of information is lost and the analyzer simply cannot connect it with (y == 10) at L2 in order to correctly restrain the value of x to [0..9]. So at L3 it will consider that the value of x is [0..--] and it will emit an alarm about a potential out of bound access to the array T.

To help the analyzer, the appropriate assertion can be added explicitly:

at L3: assert ax: x < 10;

Then the alarm will disappear. Of course the ax assertion still needs to be verified by other means. For example, this particular assertion can be easily proven using WP (see Prove Annotations with WP).

State Splitting

State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:

at L1: assert ay: y <= 10 || 10 < y;

As explained before, the analyzer can store multiple memory states at each program point (as explained in About the Value Analysis) and the maximal number of states that can be stored per program point in the internal representation is related to the precision level (i.e. slevel). So, provided that the -slevel option has set the precision level high enough in order to permit splitting the state here, the assertion above will lead to a case analysis:

  • The y <= 10 case: As y <= 10 is assumed, together with the x < y condition at L1, it leads to deducing x < 10, and this time this can be represented in the internal state.
  • The 10 < y case: If 10 < y is assumed, then the condition at L2 is false, and therefore the execution branch where the array T is accessed at L3 is not reached at all.

Thanks to introducing this assertion, the alarm will thus disappear. Moreover, the analyzer is able to check on its own that ay is always true, so there is nothing more to verify.

It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.

Tip

Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).

Store Relational Information

In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.

For example, if the program contains a test (0 <= i+j < 10) and then several uses of T[i+j] follow, it may be convenient to add a temporary variable representing the i+j sum:

int tmp = i + j;
if (tmp < 10) {
  ..
  //@ assert a_tmp: tmp == i + j;
  ... T[tmp] ...
  }

If neither i nor j are modified in the meantime, the assertion that validates the code substitution should be trivial to verify, and the value analysis is now able to know that (tmp < 10).

Loop Invariant

Beside assert, requires and ensures, the loop invariant properties are also useful to enhance the analysis precision.

For instance, this function generates an alarm:

int T[100];

//@ requires 0 <= n < 50;
void main(int n)
{
    int i = 0;
    while (i <= n) {
        T[i] = 3;
        i++;
    }
    T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;

This is because the value of i is too imprecise when leaving the loop, so the analysis doesn’t know if the access to T[i] in the last assignment is valid or not.

Adding this loop invariant remove the alarm:

/*@ loop invariant i <= 50; */

Moreover, the value analysis is able to check that this property is always valid.

Bounded Buffers

A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.

buffer problem

Buffer problem: there is a relation between cur and len, but it cannot be represented.

A typical function to handle this buffer is:

void process (char * cur, size_t len) {
  char * p = cur;
  for (size_t i = 0 ; i < len ; i++, p++) {
    *p = ...
  }
}

The validity of the pointer p has to be checked to avoid an alarm on p access, but also to get precise results later on. It is especially important when the pointer points to an array that is part of a larger structure. For instance:

struct data {
  char buffer[BUFFER_LEN];
  unsigned current_index;
  unsigned state;
};

//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
  char * current = data->buffer + data->current_index;
  size_t length = BUFFER_LEN - data->current_index;
  if (n > length) n = length;
  process (current, n);
  ...
  }

If the analysis is not able to know that p does not go beyond the end of the buffer field, the value of the other fields current_index and state might be modified as well and might be too imprecise for the analysis to give interesting results later on.

So the process function needs a precondition to give the constraint on cur and len to ensure the validity of the pointer. This precondition could simply be:

//@ requires \valid (cur + 0 .. len-1);

Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:

  • if the data to reduce is the pointer:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
  • if the data to reduce is the length:
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);

Notice that the ACSL functions \base_addr, \block_length and \offset only provide the expected information when cur is a pointer to an array allocated on its own. If the array is a field in a structure, \base_addr(cur) returns the base address of the structure.

base of a pointer in a struture

Structure with a .buf field: ACSL functions are related to the allocated block, not the internal array.

Anyway in some cases, even if the analyzer computes the optimal information, cur and len both have an unknown value from intervals and the relation between the two variables has been lost. So the memory access to (*p) raises an alarm when we cannot check that adding the upper bound of both intervals is smaller than (buf + BUFFER_LEN). Moreover, if buf is in a structure as explained above, buf and BUFFER_LEN may be unknown in the function.

A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:

/*@ requires f_r_buf: val: cur <= bound;
    requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
  char * p = cur;
  //@ loop invariant f_l_p: val: p <= bound;
  for (size_t i = 0 ; i < len ; i++, p++) {
    if (p >= bound) break;
    *p = ...
  }
}

In the previous example, the call to process would have to be changed to:

process_bounded (current, n, data->buffer + BUFFER_LEN);

As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.

Two annotated functions with such bounds, tis_memset_bounded and tis_memcpy_bounded, are provided to be used instead of memset and memcpy when this problem occurs with these libc functions.

ACSL Properties

This chapter explains how to introduce ACSL properties to a project.

ACSL is the specification language employed in TrustInSoft Analyzer. ACSL properties can be used to specify functions (as seen in Write a Specification) and to guide the analysis by adding local annotations, such as assertions or loop invariants, which may help in removing false alarms (as seen in Remove Alarms by Adding Annotations).

There are two ways to insert ACSL properties in a project:

  • through inlining: putting ACSL properties directly in the analyzed application’s source code,
  • through importing: writing ACSL properties into external files and then loading these files from the command line when running the analysis.
Inline ACSL Properties

One way to add ACSL annotations to a project is to write them directly in the source code in special comments:

  • either standard C style comments: /*@ ... */,
  • or one line C++ style comments: //@ ....

There are several kinds of properties and they all need to be placed in an appropriate place in the source code:

  • Function specifications (described in Write a Specification) must be inserted either before the function itself or before the function’s declaration.
  • Local properties (described in Remove Alarms by Adding Annotations):
    • assertions must be inserted at the program point where they apply,
    • loop properties must be inserted just before the loop they apply to.

For more information about the ACSL language, please refer to the ACSL Documentation.

Import ACSL Properties

For many reasons it is usually preferable to avoid modifying the source code which is analyzed, as introducing changes to the application’s code can lead to difficulties in comparing it with the original version. For example, adding new properties alters the line numbering in a file, which makes it impossible to report problems with the original source line number.

The ACSLimporter plug-in makes it possible to write the ACSL properties into separate files and then import them for the analysis. The syntax of such files looks like this:

function <function-name>:
  contract:
    requires <pre-name-1>: <pre-definition-1>;
    assigns <assigns-definition-1>;
    ensures <post-name-1>: <post-definition-1>;

  at L1: assert <assert-name-1a>: <assert-definition-1a>;
  at L1: assert <assert-name-1b>: <assert-definition-1b>;
  at L2: assert <assert-name-2>: <assert-definition-2>;

  at loop 1:
    loop invariant <inv-name-1a>: <inv-definition-1a>;
    loop invariant <inv-name-1b>: <inv-definition-1b>;

Of course, the <...> parts should be substituted by specific names and definitions.

Depending on the organization of the project, it might be better to put all the properties in a single ACSL file or to split them throughout several files. If the properties concerning the same function appear in different files, the specifications are merged.

To load the property files, so that they are taken into account during the analysis, the -acsl-import <file.acsl> option has to be specified for each concerned ACSL file.

Naming the Annotations

Giving a unique name to each annotation permits referring to it easily later on. Moreover, it makes the result files a lot clearer and more readable: when mentioning a particular annotation they will use its name instead of the corresponding file name and line number.

Using standard naming conventions is highly recommended. Some tools require particular naming of assertions to properly check that everything have been verified at the end of the analysis.

The proposed naming conventions are:

  • To choose a unique short prefix for annotations concerning each function (example: add_).
  • To include a letter that indicates the kind of each property (e.g. first requires property for function add could be then named add_r1). (However, this not really necessary if the name is always used together with the corresponding keyword, like for example: requires add_r1, ensures add_e2, etc.)
  • To use a suffix informing about the way each property is verified. For instance:
    • add_e2_val: if the property is found always valid by Value;
    • add_e2_wp: if the property is proved by WP;
    • add_e2_sc: if the property could have been removed as redundant by Scope (note: it could be necessary to keep this property anyway because it still might be useful for Value or WP computations);
    • add_e2_rv: if the property has been manually reviewed.

These naming conventions might seem to be quite cumbersome to use (especially the verification method suffix). However, as mentioned before, they make the automatic generation/verification possible, so they are highly recommended.

Prove Annotations with WP

WP refers both to a method to formally verify properties of the analyzed code and the name of the analyzer’s plug-in that implements this method. WP is a static analysis technique, like the value analysis, but involving theorem proving. For a short introduction describing how it works see Short Introduction to WP Computation.

The purpose of this chapter is mainly to explain in which cases WP can be used with a minimal amount of manual work required. This does not mean that it cannot be used in more complex situations, but then it requires more knowledge about the WP computation and/or competences in performing manual proofs using proof assistants such as Coq.

How to Run WP

The easiest way to run WP is to do it in the GUI by selecting the annotation to prove, right-clicking to open the pop-up menu, and choosing the Prove Property by WP option.

However, when the analysis evolves, it is usually more practical to run WP from the command line, save the project, and extract the status of the properties from the information file to just check that the results are still the same.

The command line to run WP and save the project looks like:

tis-analyzer -load project.state -wp-fct f1 -wp-prop f1_p1_wp, f1_p2_wp \
                           -then -wp-fct g,h -wp-prop f_pre_wp \
                           ...
                           -then -save project.wp.state

This command line:

  • opens a previously saved project project.state: it doesn’t need to include value analysis results, and even doesn’t have to include an entry point. All it needs is the body of the functions where the properties to verify are, and probably some specifications for the functions called from these functions.

  • tries to verify the properties named f1_p1_wp and f1_p2_wp in the f1 function,

  • then tries to verify the property f_pre in g and h.

    Notice that f_pre is supposed to be a precondition of f, and that it is checked in g and h which are supposed to be some of f callers;

  • saves the results in the project.wp.state project.

Notice that a _wp suffix is used in the names of the properties that are checked with WP. See Naming the Annotations to understand why naming conventions are useful.

This is an example of how to use WP, but the plug-in provides many other options if needed. Please use the -wp-help option to list them, and refer to the documentation for more details.

To know how to extract the status of the properties from the project.wp.state project, see Information about the Properties.

Short Introduction to WP Computation

Let us give very simple explanations about WP for the one that knows nothing about it, because it might be necessary to understand how it works in order to use it when suitable.

To verify that a property is true at a program point, the WP principle is to propagate it backward and to compute a formula that is such that, if it can be proved to be true, then it ensures that the initial property is true as well. The computed formula is then sent to some automatic prover(s). For instance, tis-analyzer comes with alt-ergo, but more provers can be added.

An example is easier to understand:

const int X = 10;

void f1(int x)
{
    int y = x + 1;
    int z = 2 * y;
L:  //@ assert y_val: y > X;
    ...
}

To ensure that y_val is true at L, WP computes that one have to prove that (x+1 > X) when entering the function. Notice that the z assignment has no effect since WP knows that it doesn’t modify y value. This can be automatically proved if a precondition gives:

//@ requires r_x_ge_X: x >= X;

This is because the final computed formula is:

x >= X ==> x+1 > X;

which is easily proved by any automatic prover.

It doesn’t work with the precondition:

//@ requires r_x_ge_15: x >= 15;

This is because WP only works on the function source code, which means that it has no information about the value of X. To solve this kind of problem, one can add:

//@ requires r_X_val: X == 10;

This precondition is easily validated by the value analysis and can be used by WP to finished the proof with r_x_ge_15.

In this simple case, the initial property and the computed formula are equivalent, but it is not always the case. WP just ensures that if the computed formula is true, then the property is true each time its program point is reached.

WP on Loops

To prove a loop invariant property, the WP computation is very similar, but decomposed into two goals:

  • establishment: the property must be true when reaching the loop. This is similar to an assert before the loop.
  • preservation: if the property is true at the beginning of an iteration, it is still true at the end.

Example:

//@ requires n < 100;
int main(int n)
{
    int i; int * p;
    for (i = 0, p = T; i < n; i++, p++) {
        *p = 3;
    }
    ...
}

The following property remove the alarm about the validity of the (*p) assignment in the loop:

//@ loop invariant li_1: p == T+i;

Moreover it can be proved by WP:

  • the establishment has to be proved before entering the loop, but after the initialization part. So the proof obligation is:

    T == T + 0
    
  • the preservation formula is similar to:

    p == T + i  ==>  p + 1 == T + (i + 1)
    

Both formula are trivially true.

WP for Indirect Memory Access

In the first example in Short Introduction to WP Computation, the z assignment has no effect on the WP computation since WP knows that it doesn’t modify y value. But it is different when pointers are involved. For instance:

void f(int * px, int x, int * py, int y)
{
    *px = x;
    *py = y;
    //@ assert a_px: *px == x;
    //@ assert a_py: *py == y;
    ...
}

WP is able to prove a_py, but not to prove a_px. The reason is that it doesn’t know whether the assignment to (*py) modifies (*px) value or not. The a_px can be proved only with the precondition:

//@ requires \separated (px, py);

It tells that there is no intersection between (*px) and (*py) locations in the memory.

In the context of adding annotations to remove alarms, except in very simple cases, it is not recommended to use WP when possibly overlapping pointers are involved since it may take some time to provide enough information.

WP for a Function Call

The other problem is when there are some function calls between the property and the statements that makes it true. Remember that WP only work on the source code of the property function, and on the specifications of the called functions.

extern int X;
void g (void);
void f(int x, int y)
{
    if (x > y && x > X) {
        g ();
        //@ assert ax1: x > y;
        //@ assert ax2: x > X;
        ...
    }
    ...
}

WP is able to prove ax1 since there is no way for g to modify either x or y, but ax2 cannot be proved since g may modify X.

There are two solutions to solve the problem:

  • add an assigns property for g to specify the modified data.

    For instance, ax2 is proved when adding:

    //@ assigns \nothing;
    

    This is not the preferred method since assigns are difficult to prove: it requires to know the modified data for each statement of g. The computed dependencies may help to justify the assigns property, but beware that this information is context dependent.

  • add a postcondition about the involved data. For instance:

    • specifying that X is not modified by g:

      //@ ensures X == \old (X);
      
    • or specifying that X decrease:

      //@ ensures X < \old (X);
      

Both solutions enable to prove ax2.

WP is Useful Even for Trivial Properties

The WP could seem useless if not used in complex cases, but it is not true: even when properties look trivial, it is useful to formally prove them, since it is so easy to make a mistake.

Let us look at an example:

//@ ensures e_res_ok: min <= \result <= max;
int bounds(int min, int x, int max)
{
    int res = x;
    if (x < min) res = min;
    if (x > max) res = max;
    return res;
}

The postcondition seems reasonably easy to justify, but WP is unable to prove it. WP computes a proof obligation equivalent to:

if (x > max)      then min <= max /\ max <= max
else if (x < min) then min <= min /\ min <= max
     else              min <= x   /\   x <= max

After simplifying the formula, it appears that the information (min <= max) is missing, so this postcondition cannot be proved without a precondition. It then has to be added and checked in every context where the function is called to ensure that the post-condition is verified.

WP Conclusion

The advice here is to use WP only in simple cases because complex cases needs expertise and require a lot of time. But we have seen that even for properties that look trivial, it is better to formally prove them, since it is so easy to make a mistake. Moreover, manual justification of trivial properties may look a little silly.

One must be especially careful when it seems that WP should be able to prove something, and doesn’t, since it may hide a problem somewhere. It is always better to understand if it is really a WP weakness, or something else.

Conclusion

Now that you should know how to analyze an application, it is important to insist on how it is important to put things together and check all the hypotheses.

If there is only one analysis, it is quite easy to check. The results rely on:

  • the context defined by the entry point,
  • the assigns and ensures properties of the external functions because they cannot be checked,
  • all the other annotations that are not either valid according the value analysis or proved by WP. If there are several analyses, the results of each of one rely on the same hypotheses than above, but there are more things to check:
  • the WP proofs doesn’t depend on the context, so they can be done only once, IF the functions are the same across the analyses (not modified through a macro for instance),
  • to be valid according to the value analysis, a property must have this status in ALL the analyses,
  • when splitting, remember that the context of the function analysis must represent all its precondition (see figure in Pre-conditions of a Defined Function).

To be fully verified in the given context all the hypotheses above must have a clear justification for lack of formal verification.

Dealing with Special Features

This section gives some details about how to deal with special features needed to analyze some applications:

File System

The tis-mkfs utility helps to build C files that gives information about the file system in which the application is supposed to run. For more information, please refer to the tis-mkfs Manual.

Environment variables

The default initial environment for the analysis of a program is empty. In order to perform an analysis in a specific environment, it has to be populated from the user code using one of the two methods below.

Setting environment variables

The user may set some variables by calling setenv (or putenv) in the analysis entry point.

Example:

#include <stdlib.h>
extern int main (int argc, char * argv[]);

int tis_main (int argc, char * argv[]) {
  int r;
  r = setenv ("USER", "me", 1);
  if (r != 0) return 1;
  r = setenv ("HOME", "/home/me", 1);
  if (r != 0) return 1;
  r = setenv ("SHELL", "/bin/sh", 1);
  if (r != 0) return 1;
  return main (argc, argv);
}
Initializing the whole environment

Alternatively, the user may initialize the environ standard variable. This variable:

  • is a pointer to an array of pointers to valid C strings,
  • all the strings must be valid C strings of the form "variable=value",
  • the element following the last environment variable must be NULL (elements past this NULL value, if any, will not get accessed).

Example:

#include <stdlib.h>
extern int main (int argc, char * argv[]);
extern char **environ;

int tis_main (int argc, char * argv[]) {
  char *custom_environ[] = {
    "USER=me",
    "HOME=/home/me",
    "SHELL=/bin/sh",
    NULL
  };
  environ = custom_environ;
  return main (argc, argv);
}

Using the environment

Using one of the two methods above to initialize the environment, the following program can be analyzed:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char * argv[]) {
  char * user = getenv ("USER");
  if (user)
    printf ("USER value is '%s'\n", user);
  return 0;
}

The command line would be:

$ tis-analyzer tis_main.c main.c -val -main tis_main -slevel 10

And the following output can be observed:

...
USER value is 'me'
...
Advanced usage

The initial size of the array pointed by the environ variable is controlled by the TIS_NB_INIT_ENV_ELEMENTS macro. Its default value is set to 100, but may be changed by the user (using the -D option as usual) to avoid reallocations if the application needs more than 100 variables.

Moreover, to avoid losing precision, a local slevel is used inside the implementation of the environment related functions. It is controlled by the TIS_ENV_INTERNAL_SLEVEL macro. Its default value is already very large, but it can be increased by the user if it is still too low for a specific usage.

Recursive calls

Without any option, the analysis stops on recursive function calls and lets the user decide how to handle them by choosing either -val-clone-on-recursive-calls or -val-ignore-recursive-calls.

The -val-clone-on-recursive-calls option tells the analyzer to process the calls to recursive functions exactly as it they were calls to normal functions. The function body is copied and the copy is renamed with the prefix __tis_rec_<n> where <n> is the depth of the recursion. This means that the recursive call is analyzed precisely. This works up to the limit defined with the option -val-clone-on-recursive-calls-max-depth. When the limit is reached (or when the -val-clone-on-recursive-calls is not set), the contract of the recursive function is used. Usually an assigns clause is enough to have the expected semantics. If no contract is provided, the analyzer generates a simple one but it may be an incorrect contract: so it is very recommended to provide a contract to analyze such cases.

The -val-ignore-recursive-calls option tells the analyzer to use the contract of the function to handle the recursive calls. The contract may be generated as-if the max-depth option was reached as explained above.

Note that when the --interpreter option is used the -val-clone-on-recursive-calls is automatically set.

Memory leaks

TrustInSoft Analyzer provides two ways to detect memory leaks.

  • The first way is to use the built-in function tis_check_leak to print the list of the memory blocks that are allocated but not referenced by any other memory block anymore at the program point where the built-in is called.

    Example (leak.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    char * f(int v) {
      char *p = malloc(v);
      return p;
    }
    
    int main() {
      f(42);
      tis_check_leak();
      return 0;
    }
    

    The program above can be analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations leak.c
    

    we get the following result:

    tests/val_examples/leak.c:5:[value] allocating variable __malloc_f_l5 of type char [42]
            stack: malloc :: tests/val_examples/leak.c:5 (included from tests/val_examples/leak_test.c) <-
                   f :: tests/val_examples/leak.c:10 (included from tests/val_examples/leak_test.c) <-
                   main
    tests/val_examples/leak.c:11:[value] warning: memory leak detected for {__malloc_f_l5}
    

    Indeed, 42 bytes are allocated at line 5 in f, but since the pointer returned by f is lost in main, a memory leak is detected at line 11.

    In addition, the analyzer also prints the list of possibly leaked memory blocks (memory blocks that might not be referenced by any other memory block anymore), as shown in the following example.

    Example (leak_weak.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    char * f() {
        char *p = malloc(1);
        char *q = malloc(1);
        return tis_interval(0, 1) ? p : q;
    }
    
    int main() {
        char *r = f();
        tis_check_leak();
        return 0;
    }
    

    When the program is analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations leak_weak.c
    

    we get the following result:

    $ tis-analyzer -val -val-show-allocations leak_weak.c
     [...]
     leak_weak.c:5:[value] allocating variable __malloc_f_l5 of type char
              stack: malloc :: leak_weak.c:5 <- f :: leak_weak.c:11 <- main
     leak_weak.c:6:[value] allocating variable __malloc_f_l6 of type char
              stack: malloc :: leak_weak.c:6 <- f :: leak_weak.c:11 <- main
     [value] using specification for function tis_interval
     leak_weak.c:12:[value] warning: possible memory leak detected for {__malloc_f_l5, __malloc_f_l6}
    

    Indeed, when tis_check_leak is called at line 12, the value of variable r is { NULL ; &__malloc_f_l5 ; &__malloc_f_l6 }, which means the value of r can either be NULL, or the address of the memory block allocated at line 5, or the address of the memory block allocated at line 6. Thus, the memory blocks allocated at line 5 and line 6 might be leaked.

    In order to improve the precision, the program can be analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
    

    in which case, the analyzer propagates the states where r points to NULL, __malloc_f_l5 and __malloc_f_l6 separately, thus, we get the following analysis result:

    $ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
      [...]
      leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l5}
      leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l6}
    

    It shows that in one path, __malloc_f_l5 is leaked, and in another path __malloc_f_l6 is leaked.

  • The second way of detecting memory leaks requires the user to be able to identify two points in the target program such that when the execution reaches the second point, all the memory blocks that have been allocated since the execution was at the first point have been freed. The procedure is then to insert a call to a builtin that lists all the dynamically allocated blocks in each of the two points. If the lists printed by the two builtin calls at the two points match, it means that every block that was allocated after the first point was freed before the second point was reached.

    In “interpreter mode”, in which the analyzer follows a single execution path, the tis_show_allocated builtin can be used to print the lists of allocated blocks.

    Example (leak_interpreter.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    void f (void) {
        char * p1 = malloc (10);
        char * p2 = malloc (10);
        char * p3 = malloc (10);
        char * p4 = malloc (10);
        p1 = p4;
        p4 = p2;
        free (p2);
        free (p1);
    }
    int main (void) {
        char * p = malloc (10);
        tis_show_allocated ();
        /* all the memory blocks allocated in function f should be freed */
        f ();
        tis_show_allocated ();
        free(p);
    }
    

    When the program above is analyzed with the following command line:

    $ tis-analyzer --interpreter -val leak_interpreter.c
    

    we get the following result:

    $ tis-analyzer --interpreter -val leak_interpreter.c
      [...]
      leak_interpreter.c:16:[value] remaining allocated variables:
         __malloc_main_l15
      leak_interpreter.c:19:[value] remaining allocated variables:
         __malloc_main_l15, __malloc_f_l5, __malloc_f_l7
    

    The second call to tis_show_allocated at line 19 shows that after the function call f () at line 18, two more memory block respectively allocated at line 5 and line 7 exist in the memory state since the first call to tis_show_allocated. Thus, we know that function f causes a memory leak.

    In “analyzer mode”, the first point may be visited by the analyzer several times, for different memory states, corresponding to different execution paths in the program. The memory states that reach the second point should be matched to the memory state at the first point that they correspond to. For this purpose, the tis_allocated_and_id and tis_id builtins can be used. The tis_id builtin allows to give a unique “id” to each memory state and tis_allocated_and_id builtin can be used to print, in addition to the list of allocated blocks, the value of the “id” so as to allow states to be identified.

    Example (tis_show_allocated_and_id.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    int main(){
    
        char *t[7];
        int n = tis_interval_split(0, 1);
    
        // "before" point
        unsigned long long my_id = tis_id();
        tis_show_allocated_and_id("before", my_id);
    
        t[n] = malloc(1);
        if (!t[n]) goto leave1;
        t[n+1] = malloc(1);
        if (!t[n+1]) goto leave2;
        t[n][0] = 'a';
        t[n+1][0] = 'b';
    leave2:
        free(t[n]);
    leave1:
    
        // "after" point
        tis_show_allocated_and_id("after", my_id);
    }
    

    When the program above is analyzed with the following command line:

    $ tis-analyzer -val -slevel 10 -val-show-allocations tis_show_allocated_and_id.c
    

    we get the following result:

    $ tis-analyzer -val -slevel 100 -val-show-allocations tis_show_allocated_and_id.c
      [...]
      tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:13:[value] allocating variable __malloc_main_l13 of type char
       stack: malloc :: tis_show_allocated_and_id.c:13 <- main
      tis_show_allocated_and_id.c:15:[value] allocating variable __malloc_main_l15 of type char
       stack: malloc :: tis_show_allocated_and_id.c:15 <- main
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:__malloc_main_l15
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:__malloc_main_l15
    

    The statement at line 7 causes two separate states in which variable n respectively has value 0 and 1. Then, the statement at line 10 assigns variable my_id to 0 in one state and to 1 in the other state. The call to tis_show_allocated_and_id("before", "my_id) at line 11 shows that in both “before” states where my_id is 0 and 1, there is no allocated memory block. The memory allocation statements at line 13 and line 15 cause three different paths: in one path, the allocation at line 13 fails and goto leave1 is taken at line 14, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, there is no allocated memory block in both “after” states; in one path, the allocation succeeds at line 13, but fails at line 15 and goto leave2 at line 16 is taken, then, the memory allocated at line 13 is freed at line 20, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, there is no allocated memory block left in both “after” states; in the other path, both the allocation at line 13 and at line 15 succeed, but only the memory block allocated at line 13 is freed at line 20, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, the memory block allocated at line 15 remains in both “after” states, that is a memory leak problem. From this example, we can also see that there might be several “after” states for each “before” state: in this case each “after” state much be checked against its corresponding “before” state for memory leaks.

Target Architecture

TrustInSoft Analyzer faithfully emulates the hardware features of the targeted platform: endianness, size of integer types, and alignment constraints.

The command -machdep help lists the supported architectures, and the command -machdep verbose shows a brief summary of the main characteristics of each supported architecture.

$ tis-analyzer -machdep help
[kernel] supported machines are: aarch64 aarch64eb apple_ppc_32 arm_eabi armeb_eabi
         gcc_aarch64 gcc_aarch64eb gcc_arm_eabi gcc_armeb_eabi gcc_mips_64
         gcc_mips_n32 gcc_mips_o32 gcc_mipsel_64 gcc_mipsel_n32 gcc_mipsel_o32
         gcc_ppc_32 gcc_ppc_64 gcc_rv32ifdq gcc_rv64ifdq gcc_sparc_32 gcc_sparc_64
         gcc_x86_16 gcc_x86_16_huge gcc_x86_32 gcc_x86_64 mips_64 mips_n32 mips_o32
         mipsel_64 mipsel_n32 mipsel_o32 ppc_32 ppc_64 rv32ifdq rv64ifdq sparc_32
         sparc_64 x86_16 x86_16_huge x86_32 x86_64 x86_win32 x86_win64
         (default is x86_32).

These supported platforms are:

  • x86_16
    An x86 processor in 16-bit mode, in “tiny” or “small” memory model (using 16-bit “near” pointers). The long double type is the 80387 80-bit native format.
  • x86_16_huge
    An x86 processor in 16-bit mode, in “large” or “huge” memory model (using 32-bit “far” pointers). The long double type is the 80387 80-bit native format.
  • x86_32
    An x86 processor in 32-bit mode. The long double type is the 80387 80-bit native format.
  • x86_win32
    An x86 processor in 32-bit mode, using the Win32 ABI.
  • x86_64
    An x86-64 processor in 64-bit mode. The long double type is the IEEE754 quad-precision type.
  • x86_win64
    An x86 processor in 64-bit mode, using the Win64 ABI. The long type is 32 bits.
  • ppc_32
    A 32-bit PowerPC processor. The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • ppc_64
    A 64-bit POWER processor. The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • apple_ppc_32
    Same as ppc_32, but forcing the “char” type to be signed, as done by the Apple toolchain (as opposed to the default PowerPC use of unsigned char type), and allowing gcc language extensions.
  • arm_eabi, armeb_eabi
    A 32-bit ARM processor, using the extended ABI (EABI). The char type is unsigned.
  • aarch64, aarch64eb
    A 64-bit ARMv8 processor (AArch64). The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • sparc_32
    A 32-bit SparcV8 processor.
  • sparc_64
    A 64-bit SparcV9 processor. The long double type is the IEEE754 quad-precision type.
  • mips_o32, mipsel_o32
    A 32-bit MIPS processor, using the old ABI (o32).
  • mips_n32, mipsel_n32
    A 64-bit MIPS processor, using the new 32-bit ABI (n32). The long double type is the IEEE754 quad-precision type.
  • mips_64, mipsel_64
    A 64-bit MIPS processor, using the 64-bit ABI (64 or n64). The long double type is the IEEE754 quad-precision type.
  • rv32ifdq
    A RISC-V processor using the RV32I ISA, with D, F and Q floating-point extensions. The long double type is the IEEE754 quad-precision type.
  • rv64ifdq
    A RISC-V processor using the RV64I ISA, with D, F and Q floating-point extensions. The long double type is the IEEE754 quad-precision type.

Unless otherwise specified in the list above, the characteristics of the fundamental types are:

  • char
    is 8 bits, and defaults to signed.
  • short
    is 16 bits.
  • int
    is 16 bits on 16-bit machdeps (x86_16, x86_16_huge), 32 bits otherwise.
  • long
    is 64 bits on 64-bit machdeps, 32 bits otherwise.
  • long long
    is 64 bits.
  • float
    is 32 bits (IEEE754 single-precision type).
  • double
    is 64 bits (IEEE754 double-precision type).
  • long double
    is identical to double.

With the exception of apple_ppc_32, x86_win32 and x86_win64, all these machdeps may be specified with the gcc_ prefix, in which case gcc language extensions are allowed, and the __int128 integer type is available on 64-bit machdeps.

The endianness of the supported architecture is specified as:

  • little endian
    • x86_16
    • x86_16_huge
    • x86_32
    • x86_64
    • x86_win32
    • x86_win64
    • arm_eabi
    • aarch64
    • mipsel_o32
    • mipsel_n32
    • mipsel_64
    • rv32ifdq
    • rv64ifdq
  • big endian
    • apple_ppc_32
    • ppc_32
    • ppc_64
    • armeb_eabi
    • aarch64eb
    • sparc_32
    • sparc_64
    • mips_o32
    • mips_n32
    • mip_64

To switch to another architecture quickly, one of the following options may be used:

  • -16 for gcc_x86_16
  • -32 for gcc_x86_32
  • -64 for gcc_x86_64
$ tis-analyzer -64 ...
New Architecture on the Fly

When the targeted architecture is not supported out-of-the-box, a new architecture corresponding to a specific target may be defined and dynamically loaded by TrustInSoft Analyzer.

A new plug-in is defined with the values of the types for the targeted architecture. Create a new file custom_machdep.ml with the following content and edit the necessary values:

open Cil_types

let mach =
  { version = "foo";
    compiler = "bar";
    sizeof_short = 2; (* __SIZEOF_SHORT *)
    sizeof_int = 4; (* __SIZEOF_INT *)
    sizeof_long = 4; (* __SIZEOF_LONG *)
    sizeof_longlong = 8; (* __SIZEOF_LONGLONG *)
    sizeof_int128 = 0;
    sizeof_ptr = 4; (* related to __INTPTR_T *)
    sizeof_float = 4;
    sizeof_double = 8;
    sizeof_longdouble = 12;
    sizeof_void = 1;
    sizeof_fun = 1;
    size_t = "unsigned long"; (* __SIZE_T *)
    char16_t = "unsigned short";
    char32_t = "unsigned int";
    wchar_t = "int"; (* __WCHAR_T *)
    ptrdiff_t = "int"; (* __PTRDIFF_T *)
    max_align_t = "long double"; (* __MAX_ALIGN_T *)
    alignof_short = 2;
    alignof_int = 4;
    alignof_long = 4;
    alignof_longlong = 4;
    alignof_int128 = 0;
    alignof_ptr = 4;
    alignof_float = 4;
    alignof_double = 4;
    alignof_longdouble = 4;
    alignof_str = 1;
    alignof_fun = 1;
    alignof_aligned= 16;
    pack_max = 16;
    char_is_unsigned = false;
    char_bit = 8; (* __CHAR_BIT *)
    const_string_literals = true;
    little_endian = false; (* __TIS_BYTE_ORDER *)
    has__builtin_va_list = true;
    __thread_is_keyword = true;
    has_int128 = false; }

let () =
  Stage.run_after_loading_stage
    (fun () ->
       Core.result "Registering machdep 'mach' as 'custom'";
       ignore (Machdeps.register_machdep
               ~short_name:"custom" ~cpp_target_options:[] mach))

Define a new header containing the values of the types. Create a new file __fc_custom_machdep.h with the following content:

/* skeleton of a real custom machdep header. */
#ifndef __TIS_MACHDEP
#define __TIS_MACHDEP

#ifdef __TIS_MACHDEP_CUSTOM

// __CHAR_UNSIGNED must match mach.char_is_unsigned
#undef  __CHAR_UNSIGNED
#define __WORDSIZE 32

// __CHAR_BIT must match mach.char_bit
#define __CHAR_BIT 8
// __SIZEOF_SHORT must match mach.sizeof_short
#define __SIZEOF_SHORT 2
// __SIZEOF_INT must match mach.sizeof_int
#define __SIZEOF_INT 4
// __SIZEOF_LONG must match mach.sizeof_long
#define __SIZEOF_LONG 4
// __SIZEOF_LONGLONG must match mach.sizeof_longlong
#define __SIZEOF_LONGLONG 8

// __TIS_BYTE_ORDER must match mach.little_endian
#define __TIS_BYTE_ORDER __BIG_ENDIAN

#define __TIS_SCHAR_MIN (-128)
#define __TIS_SCHAR_MAX 127
#define __TIS_UCHAR_MAX 255
#define __TIS_SHRT_MIN (-32768)
#define __TIS_SHRT_MAX 32767
#define __TIS_USHRT_MAX 65535
#define __TIS_INT_MIN (-__TIS_INT_MAX - 1)
#define __TIS_INT_MAX 8388607
#define __TIS_UINT_MAX 16777216
#define __TIS_LONG_MIN (-__TIS_LONG_MAX -1L)
#define __TIS_LONG_MAX 2147483647L
#define __TIS_ULONG_MAX 4294967295UL
#define __TIS_LLONG_MIN (-__TIS_LLONG_MAX -1LL)
#define __TIS_LLONG_MAX 9223372036854775807LL
#define __TIS_ULLONG_MAX 18446744073709551615ULL

#define __INT8_T signed char
#define __TIS_INT8_MIN __TIS_SCHAR_MIN
#define __TIS_INT8_MAX __TIS_SCHAR_MAX
#define __UINT8_T unsigned char
#define __TIS_UINT8_MAX __TIS_UCHAR_MAX
#define __INT_LEAST8_T __INT8_T
#define __TIS_INTLEAST8_MIN __TIS_INT8_MIN
#define __TIS_INTLEAST8_MAX __TIS_INT8_MAX
#define __UINT_LEAST8_T __UINT8_T
#define __TIS_UINTLEAST8_MAX __TIS_UINT8_MAX
#define __INT_FAST8_T __INT8_T
#define __TIS_INTFAST8_MIN __TIS_INT8_MIN
#define __TIS_INTFAST8_MAX __TIS_INT8_MAX
#define __UINT_FAST8_T __UINT8_T
#define __TIS_UINTFAST8_MAX __TIS_UINT8_MAX

#define __INT16_T signed short
#define __TIS_INT16_MIN __TIS_SHRT_MIN
#define __TIS_INT16_MAX __TIS_SHRT_MAX
#define __UINT16_T unsigned short
#define __TIS_UINT16_MAX __TIS_USHRT_MAX
#define __INT_LEAST16_T __INT16_T
#define __TIS_INTLEAST16_MIN __TIS_INT16_MIN
#define __TIS_INTLEAST16_MAX __TIS_INT16_MAX
#define __UINT_LEAST16_T __UINT16_T
#define __TIS_UINTLEAST16_MAX __TIS_UINT16_MAX
#define __INT_FAST16_T __INT16_T
#define __TIS_INTFAST16_MIN __TIS_INT16_MIN
#define __TIS_INTFAST16_MAX __TIS_INT16_MAX
#define __UINT_FAST16_T __UINT16_T
#define __TIS_UINTFAST16_MAX __TIS_UINT16_MAX

#define __INT32_T signed int
#define __TIS_INT32_MIN __TIS_INT_MIN
#define __TIS_INT32_MAX __TIS_INT_MAX
#define __UINT32_T unsigned int
#define __TIS_UINT32_MAX __TIS_UINT_MAX
#define __INT_LEAST32_T __INT32_T
#define __TIS_INTLEAST32_MIN __TIS_INT32_MIN
#define __TIS_INTLEAST32_MAX __TIS_INT32_MAX
#define __UINT_LEAST32_T __UINT32_T
#define __TIS_UINTLEAST32_MAX __TIS_UINT32_MAX
#define __INT_FAST32_T __INT32_T
#define __TIS_INTFAST32_MIN __TIS_INT32_MIN
#define __TIS_INTFAST32_MAX __TIS_INT32_MAX
#define __UINT_FAST32_T __UINT32_T
#define __TIS_UINTFAST32_MAX __TIS_UINT32_MAX

#define __INT64_T signed long long
#define __TIS_INT64_MIN __TIS_LLONG_MIN
#define __TIS_INT64_MAX __TIS_LLONG_MAX
#define __UINT64_T unsigned long long
#define __TIS_UINT64_MAX __TIS_ULLONG_MAX
#define __INT_LEAST64_T __INT64_T
#define __TIS_INTLEAST64_MIN __TIS_INT64_MIN
#define __TIS_INTLEAST64_MAX __TIS_INT64_MAX
#define __UINT_LEAST64_T __UINT64_T
#define __TIS_UINTLEAST64_MAX __TIS_UINT64_MAX
#define __INT_FAST64_T __INT64_T
#define __TIS_INTFAST64_MIN __TIS_INT64_MIN
#define __TIS_INTFAST64_MAX __TIS_INT64_MAX
#define __UINT_FAST64_T __UINT64_T
#define __TIS_UINTFAST64_MAX __TIS_UINT64_MAX

#define __INT_MAX_T __INT64_T
#define __TIS_INTMAX_MIN __TIS_INT64_MIN
#define __TIS_INTMAX_MAX __TIS_INT64_MAX
#define __UINT_MAX_T __UINT64_T
#define __TIS_UINTMAX_MAX __TIS_UINT64_MAX

// __INTPTR_T  must match mach.sizeof_ptr
#define __INTPTR_T __INT32_T
#define __TIS_INTPTR_MIN __TIS_INT32_MIN
#define __TIS_INTPTR_MAX __TIS_INT32_MAX
#define __UINTPTR_T __UINT32_T
#define __TIS_UINTPTR_MAX __TIS_UINT32_MAX

// __PTRDIFF_T must match mach.ptrdiff_t
#define __PTRDIFF_T int
// __MAX_ALIGN_T  must match mach.max_align_t
#define __MAX_ALIGN_T long double
// __SIZE_T must match mach.size_t
#define __SIZE_T unsigned int
#define __SSIZE_T int
#define __TIS_PTRDIFF_MIN __TIS_INT_MIN
#define __TIS_PTRDIFF_MAX __TIS_INT_MAX
#define __TIS_SIZE_MAX __TIS_UINT_MAX
#define __TIS_SSIZE_MAX __TIS_INT_MAX

// __WCHAR_T  must match mach.wchar_t
#define __WCHAR_T int
#define __TIS_WCHAR_MIN __TIS_INT_MIN
#define __TIS_WCHAR_MAX __TIS_INT_MAX
#define __WINT_T long long int
#define __TIS_WINT_MIN __TIS_LLONG_MIN
#define __TIS_WINT_MAX __TIS_LLONG_MAX
#define __WCTRANS_T long long int
#define __WCTYPE_T long long int

#define __SIG_ATOMIC_T volatile int
#define __TIS_SIG_ATOMIC_MIN __TIS_INT_MIN
#define __TIS_SIG_ATOMIC_MAX __TIS_INT_MAX

// Common machine specific values (PATH_MAX, errno values, etc) for Linux
// platforms, usually applicable anywhere else.
#include "__fc_machdep_linux_gcc_shared.h"

#else
#error "I'm supposed to be included with __TIS_MACHDEP_CUSTOM macro defined"
#endif
#endif

NB: The previous content is not close to a real architecture but is given as an example of possibilities.

Warning

It is important for the data defined in the two files above to have compatible values.

The new architecture may now be tested:

#include "limits.h"

int main(void)
{
    return INT_MAX;
}

To analyze it with TrustInSoft Analyzer, load the plug-in that defines the custom machdep and then add the option -D __TIS_MACHDEP_CUSTOM.

$ tis-analyzer -load-script custom_machdep.ml -I . -D __TIS_MACHDEP_CUSTOM -machdep custom -val test.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  __retres ∈ {8388607}
Additional target architecture tuning for floating-point behavior

The command-line options -all-rounding-modes-constants and -all-rounding-modes change the interpretation of floating-point constants and computations. The default, when both options are unset, is to assume a strict IEEE 754 platform where float is mapped to the IEEE 754 binary32 format, double` is mapped to binary64, and the rounding mode is never changed from its nearest-even default. The following deviations from strict IEEE 754 behavior can be taken into account in TrustInSoft Analyzer:

  • The C99 and C11 standards allow excess precision for floating-point computations when FLT_EVAL_METHOD is defined to a value other than 0 by the compiler.
  • Some compilers do not follow the C99 standard’s specifications with respect to excess floating-point precision: they set FLT_EVAL_METHOD to 2 or 0, but actually produce floating-point results inconsistent with these settings.
  • The C99 and C11 standards allow floating-point expression contractions with #pragma STDC FP_CONTRACT ON. Some C compilers are taking the path of enabling this by default (ref: https://reviews.llvm.org/D24481 ).
  • Some target architectures flush subnormals to zero.

The user who applies TrustInSoft Analyzer to C programs containing significant floating-point computations is invited to open a ticket in the support site with details of the compiler and architecture.

Additional tuning for the typing of integer constants

An unfortunate choice in the C89 standard has led, when the C99 standard was published, to incompatibilities for the types assigned to integer constants. The problem is compounded by some compilers’ eagerness to provide extended integer types. The type of integer constants can, though the usual arithmetic conversions, influence the results of integer computations.

By default, TrustInSoft Analyzer types integer constants the same way a C99-compliant compiler would, and invites the user to pick a choice if a signed integer constant that cannot be represented as a long long is encountered in the target program. Since long long is at least 64-bit, this does not happen unless an integer constant in the program is larger than 9,223,372,036,854,775,807.

The command-line option -integer-constants c89strict can be used to select the C89 behavior. On an ILP32 compiler following the C89 standard, the following program returns 1 with res = 1, whereas it returns 2 with res = 2 when compiled with a C99 compiler:

int main(void)
{
    int res;
    if (-3000000000 < 0)
        res = 1;
    else
        res = 2;
    return res;
}
$ tis-analyzer -val integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  res ∈ {1}
$ tis-analyzer -val -integer-constant c89strict integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  res ∈ {2}
Assembler Code

The assembly code is accepted by the tool, but most of the analyzers are not able to understand it.

The assembly statements appear in the statements.csv information file (see Information about the Statements) with the keyword asm in the kind column. It can be the case that these statements are not reachable in the studied context, so they can be safely ignored. If they are reachable, the value analysis ignores them, but it warns when it has do so with is message:

[value] warning: assuming assembly code has no effects in function xxx

It is fine if the assembly statements have no effect, or if the effect can be ignored, but it is most likely not the case!

The simplest way to handle assembly code in the value analysis is to write a specification for the functions that enclose such statements and use the option -val-use-spec to use them instead of the function bodies.

If a function is composed of other statements besides the assembly ones, WP can be used to check the specification. Because WP do not understand assembler either, the assembly statements have to be specified using a statement contracts. These ones can’t be verified against assembler, but are used to verify the remaining of the function.

Example:

//@ requires pre: n >= 0; ensures post: \result == n * 3 * x + 10;
int asm_fun(int x, int n)
{
    int res = 0;

    /*@ loop assigns i, res;
        loop invariant l1: i <= n;
        loop invariant l2: res == i * 3 * x;
    */
    for (int i = 0; i < n; i++) {
        int y;
        //@ assigns y; ensures y == 3 * x;
        asm ("leal (%1,%1,2), %0" // x + x * 2 --> == 3 * x
             : "=r" (y)
             : "r" (x)
            );
        res += y;
    }
    return res + 10;
}

The specification of asm_fun has to be used during value analysis using the -val-use-spec options. But this specification can be verified using WP assuming the assembler statement properties are correct, which is a smaller hypothesis.

Physical Addresses

By default, reading or writing data located at some memory addresses is only valid if that part of the memory has been allocated, otherwise, the operation is invalid, and raises an alarm. But in some cases, for instance to have access to hardware resources, one needs to read or write to absolute addresses, without having to allocate that part of the memory.

Example:

#define SZ 32
#define BASE_ADDR ((char*)0x400000)

void main (void) {
  for (int i = 0; i < SZ; i++)
    BASE_ADDR[i] = 0;
}

Analyzing such code raises alarms about invalid memory accesses:

void main(void)
{
  int i;
  i = 0;
  while (i < 32) {
    /*@ assert
        Value: pointer_arithmetic: \inside_object((void *)((char *)0x400000));
    */
    *((char *)0x400000 + i) = (char)0;
    i ++;
  }
  return;
}

There are two solutions to this problem:

  • either use the -absolute-valid-range option,
  • or slightly change the code (for the analysis only) to declare an array and use it instead of the absolute addresses.

Both solutions are explained below, but the second one is to be preferred when possible.

With the option

Using the option -absolute-valid-range <min-max> to tell the tool that all absolute addresses between min and max are valid addresses. The min and max values are integers in decimal, hexadecimal (with 0x or 0X prefix), octal (0o) or binary (0b) notation and must fit in 64 bits. Note that using inverted bounds (min > max) resets all absolute addresses to be invalid from the command line.

On the previous example, adding -absolute-valid-range 0x400000-0x40001F to the command line solves the problem.

With code modification

The idea is to declare an array that represents the memory chunk and to use it instead of the absolute addresses:

#define SZ 32

#ifdef __TRUSTINSOFT_ANALYZER__
extern char tis_array[SZ];
#define BASE_ADDR tis_array
#else
#define BASE_ADDR ((char*)0x400000)
#endif

void main (void) {
  for (int i = 0; i < SZ; i++)
    BASE_ADDR[i] = 0;
}

Notice that, because we do not want to modify the original code, this modification should be protected by a test on the __TRUSTINSOFT_ANALYZER__ macro which is automatically set by tis-analyzer. Since the array is only declared (and not defined) the analyzer makes no hypotheses on the initial values of its elements.

This solution is better because it ensures that the addresses in this range are always computed in a controlled way (through the BASE_ADDR macro for instance in this example) and not by an unrelated computation.

Moreover, if there are several such ranges, using different arrays ensures that the zones are separated, i.e. that an address in the second zone cannot be computed with some pointer arithmetic on a pointer in the first zone. Having such hypotheses may make the code easier to analyze.

Physical addresses of global variables

In some low level applications, instructions are given to the linker to put symbols at a precise address in the memory. To provide this information to the analyzer, the -absolute-address may be used.

Set address bits

By default, the address size of a variable depends on the architecture chosen by the user, such as, 64-bit pointers for x86_64, and 32-bit pointers for x86_32. However, some low-level code may sometimes for example take advantage of the fact that some code is guaranteed to be within the first 4GiB of memory, thus, it stores the addresses of some variables or functions into 32-bit integers. In order to analyze such programs precisely, TrustInSoft Analyzer provides the attribute tis_address_width that can be used to specify the size of a specific address.

Example (addr_32bits.c):

1
2
3
4
5
6
7
8
9
#include <stdint.h>

int main (){
    char c __attribute__((tis_address_width(sizeof(uint32_t))));
    uint64_t high = (uint64_t)&c >> 32;
    uint64_t low = (uint64_t)&c & 0xFFFFFFFF;
    char *p =(char *) (high << 32UL | low);
    *p = 42;
}

With the attribute __attribute__((tis_address_width(sizeof(uint32_t)))), the address of variable c is specified to have the size of type uint32_t. When the program is analyzed, we get the following result:

$ tis-analyzer -val -val-print addr_32bits.c
  [...]
  [value] ====== VALUES COMPUTED ======
  [value] Values at end of function main:
  c ∈ {42}
  high ∈ {0}
  low ∈ {{ (unsigned long long)&c }}
  p ∈ {{ &c }}
  __retres ∈ {0}

The analyzer makes use of the fact that &c is a 32-bit pointer, and computes precise values for variables high, low and p, and thus, variable c is set to 42 with the statement at line 8.

Compiler Specific Keywords

Some compilers have extensions with non standard keywords. One easy way to remove these keywords from the source code is to define them as empty macros. For instance:

-cpp-extra-args="-Dinterrupt=''"

Beware that it is up to you to check if removing these keywords might change the verification relevance.

Multi-Threading

The Mthread plug-in makes it possible to verify multi-thread programs.

Because it also uses the same value analysis, it provides the same alarm detection, but it takes into account all the possible concurrent behaviors of the program by analyzing all the possible interleavings between all threads. As before, this represent an over-approximation of the possible behaviors of the program.

Moreover, Mthread can provide an over-approximation of the memory zones that are accessed concurrently by more than one thread. For each zone and thread, Mthread also returns the program points at which the zone is accessed, whether the zone is read or is written, and the callstack that lead to the statement.

Using the plug-in requires to add a stubbed version of the used concurrency library to the analyzed source files. For some concurrency libraries, this file is provided with the tool (currently pthread, VxWorks and Win32).

Please ask for more information if needed.

Strict Aliasing

The Strict Aliasing plug-in detects the violation of the strict aliasing rule as defined in the C99 and the C11 standards.

The references taken from the C11 standard for the strict aliasing rule are:

  • section 6.5p6 defining what is an effective type
  • section 6.5p7 defining how an object can be accessed
  • note 87 defining a rule specific to allocated memory

The strict aliasing analysis is currently in beta.

The analysis

The strict aliasing analysis is available using the parameter -sa when starting an analysis with TrustInSoft Analyzer. Using this option automatically launches the value analysis.

If a violation of the strict aliasing rule is detected during an analysis, a warning is displayed. However, this warning does not stop the analysis.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int foo(int *p, float *q)
{
    *p = 42;
    *q = 1.337;
    return *p;
}

int main(void)
{
    int x;
    return foo(&x, (float *)&x);
}

Given the previous C file foo.c, the strict aliasing analysis can be launched using the following command:

$ tis-analyzer -sa foo.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
foo.c:4:[sa] warning: The pointer q has type float *. It violates strict aliasing rules by
              accessing a cell with effective type int.
              Callstack: foo :: t.c:11 <- main
[value] done for function main

A violation of the strict aliasing rule is detected by the analyzer. The analyzer provides details about the violation: the pointer has the float type and the cell has the int type. However, the types are incompatible.

Strict Aliasing Analysis Options

Several options exist to parametrize the strict aliasing analysis.

The option -sa-strict-enum
Default Value:not set by default
Opposite:-sa-no-strict-enum

By default, the strict aliasing analysis uses the integer representation of the enum type. This enables to use a pointer type to the integer representation and to use the pointer to access the enum cell. The -sa-strict-enum option limits the default behavior: it only enables an access to an enum cell by using a pointer to the same enum type. For example:

1
2
3
4
5
6
7
8
9
enum E { a, b };

int main(void)
{
    enum E e = a, *p = &e;
    *(int *)p = 42;
    *p = b;
    return e;
}

The access at line 6 is accepted by default by the strict aliasing analysis because the example uses a pointer to a correct integer representation, as shown by the following output:

$ tis-analyzer -sa enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main

When using the -sa-strict-enum option, the strict aliasing analysis detects a violation at line 6, because it does not accept the integer representation.

$ tis-analyzer -sa -sa-strict-enum enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
enum.c:6:[sa] warning: The pointer (int *)p has type int *. It violates strict aliasing rules by
              accessing a cell with effective type enum E.
              Callstack: main
[value] done for function main
The option -sa-strict-struct
Default Value:not set by default
Opposite:-sa-no-strict-struct

When taking the address of a structure member, the strict aliasing analysis keeps track of the structure and the member, in order to check future pointer use. By default, the analyzer enables to access a memory location with an effective type of structure member by a pointer having the same type as the member. For example:

1
2
3
4
5
6
7
8
9
struct s { int a; };

int main(void)
{
    struct s s = { 0 };
    int *p = &s.a;
    *p = 42;
    return s.a;
}

The access at line 7 is enabled by the analyzer because the pointer p has the same type as the member a of the structure s.

$ tis-analyzer -sa struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main

When using the -sa-strict-struct option, this access is signaled as non-conformant because the member must be accessed with the same effective type (i.e. accessed by a pointer to the whole structure only).

$ tis-analyzer -sa -sa-strict-struct struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
struct.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
                a cell with effective type (struct s).a[int].
                Callstack: main
[value] done for function main
The option -sa-strict-union
Default Value:set by default
Opposite:sa-no-strict-union

When taking the address of a union member, the strict aliasing analysis keeps information about the whole union, and not only the referenced member. The analyzer limits the access to a memory location that have a union type by a pointer to the same union type.

1
2
3
4
5
6
7
8
9
union u { int a; };

int main(void)
{
    union u u = { 0 };
    int *p = &u.a;
    *p = 42;
    return u.a;
}

The access at line 7 is not valid according to the analyzer because the pointer p has not the expected union u type, even though the union includes a member having the same type as pointed by p.

$ tis-analyzer -sa union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
union.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
               a cell with effective type (union u)[int].
               Callstack: main
[value] done for function main

When using the opposite option -sa-no-strict-union, the access is enabled, because the union u includes a member of int type.

$ tis-analyzer -sa -sa-no-strict-union union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
Volatile Variables

The C volatile keyword is a qualifier that may be precised when declaring a variable. Such a qualifier tells the compiler that the value of a volatile variable may change without any action being taken by the code the compiler finds nearby. In particular, a variable qualified as volatile can be modified outside the program. This means that even if it has just been assigned, its value is unpredictable.

The value analysis handles volatile variables, but WP does not at the moment.

Warning

WP MUST NOT be used on functions using volatile data.

The option -volatile-globals

By default, TrustInSoft Analyzer treats as volatile only those variables that are declared as such in a program. The -volatile-globals option allows to indicate a set of global variables that the value analysis has to consider volatile despite them not being declared as such in the program.

Providing the option with the wording NULL forces the value analysis to consider a given absolute memory range, typically specified by the option -abolute-valid-range, as volatile.

Functions With Variable Arguments
The C11 standard references

Our references from the C11 standard here are:

  • the section 7.16: Variable arguments <stdarg.h>,
  • and the appropriate parts, i.e. those concerning variadic macros, of section J.1 (Undefined behavior) of appendix J (Portability issues).
Limitations: what is not satisfying the standard?
The va_list type

The type va_list, declared in the <stdarg.h> header, should be, according to the C11 standard, a complete object type. There are some limitations on TrustInSoft Analyzer’s capabilities of handling this type and its correct use on the basic level.

Only in local variables

Objects of the va_list type are handled correctly only if they appear as local variables, formal arguments, etc. We do not support global va_list variables, arrays of va_list objects nor va_list as the type of fields of complex types (i.e. unions or structures). A fatal error will occur if va_list objects are used in such a way (however if they are just declared and then not actually used there is no error).

Assignments

It is not entirely clear (in the C11 standard) if performing assignments on variables of va_list type (i.e. ap1 = ap2;) is permitted or not. However, as both gcc and clang compilers refuse to compile such operations, we assume that they are not permitted and thus we do not handle them. TrustInSoft Analyzer does verify though if va_list assignments appear in the code: if they do, the program will not be rejected, even though the behavior is undefined.

Casting
Casts to the va_list type

Casting from other data types to the va_list type is implicitly forbidden and this rule is enforced by TrustInSoft Analyzer: an appropriate error will occur if such a cast is encountered and the program will be rejected.

Casts from the va_list type

Casting from the va_list type to other data types is also forbidden, though we do not enforce it: programs containing such cases are incorrect, but they will not be rejected by TrustInSoft Analyzer.

Passing va_list objects to other functions

The va_list objects can be passed as arguments to other functions. However, if the called function invokes the va_arg macro on this object, its value in the calling function becomes indeterminate (i.e. it cannot be used anymore for va_arg in the calling function). This rule does not apply if va_list is passed by a pointer. See subsection 7.16.1 (of the C11 standard).

We do not enforce this rule in TrustInSoft Analyzer. All va_list objects passed to other functions are treated as if they were passed by pointer. Each invocation of the va_arg macro on such a va_list object will simply return the subsequent argument, without considering where the previous va_arg invocations happened.

This approach is similar to what is implemented in gcc and clang.

Returning va_list objects from functions

It is not completely clear (in the C11 standard), but it seems that functions returning va_list objects are allowed. However, as both gcc and clang compilers refuse to compile such functions, we assume that using them is not permitted. TrustInSoft Analyzer rejects programs which declare functions returning va_list objects.

Undefined behavior cases not completely verified

Some undefined behavior cases enumerated in the C11 standard, see section J.1 (Undefined behavior) of appendix J (Portability issues), are not verified by TrustInSoft Analyzer.

Invoking va_arg on va_list objects passed to other functions
The macro va_arg is invoked using the parameter ap that was passed to a function that invoked the macro va_arg with the same parameter (7.16).

Status: Not verified at all (as stated above).

Suppressing definitions of variadic macros
A macro definition of va_start, va_arg, va_copy, or va_end is suppressed in order to access an actual function, or the program defines an external identifier with the name va_copy or va_end (7.16.1).

Status: Not verified at all.

The type parameter of the va_arg macro
The type parameter to the va_arg macro is not such that a pointer to an object of that type can be obtained simply by postfixing a * (7.16.1.1).

Status: Not verified at all.

The parmN parameter of the va_arg macro
The parameter parmN of a va_start macro is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions (7.16.1.4).

Status:

  • We do not check whether the parameter parmN is declared with the register storage class.
  • The function type case is unclear (are function pointers concerned?).
  • The array type case is also unclear (when we are passing arrays their type changes to pointers, so it is impossible to check).
  • We verify correctly the condition concerning compatibility with the type resulting after default argument promotions.
Old-style variadic macros

TrustInSoft Analyzer does not handle properly the old-style variadic macros from the obsolete <varargs.h> header. It does not reject programs containing such macros, but upon encountering them the behavior is undefined.

Functions from <math.h>

Many C standard functions declared in <math.h> are automatically bounded to TrustInSoft Analyzer built-ins (see the Built-ins section).

For some functions, like sin for instance, two built-ins are provided: tis_sin and tis_sin_precise. So there is no default choice because choosing one or the other depend very much on how the functions are used in the application. Typically, the tis_sin_precise built-in is more precise for analyses that do not involve widening. Using it otherwise may lead to less precise result than the naive version tis_sin.

Floating-point

In order to analyze programs that contain floating-point numbers, instead of using the default floating-point abstract values, it is possible to use more precise ones in which all the possible floating-point values can be represented, including infinities and NaNs.

This feature is available by setting the environment variable TIS_ADVANCED_FLOAT=1.

Data Representation

An abstract floating-point value can be represented either as a single floating-point value {0.125} or as distinct intervals for negative and positive values and flags indicating the presence of infinities, NaNs and zeros.

For example, the abstract value {±0.,+∞,NaN} [-8.2 .. -1.3] {1.35} indicates the presence of negative and positive zeros, positive infinity, all the NaNs, a negative interval of finite floating-point values and a single positive finite floating-point value.

Building an abstract float

It is possible to build an abstract floating-point value with the following builtins:

  • double tis_abstract_double(double __nmin, double __nmax, double __pmin, double __pmax, int __nzero, int __pzero, int __ninf, int __pinf, int __nan);
  • float tis_abstract_float(float __nmin, float __nmax, float __pmin, float __pmax, int __nzero, int __pzero, int __ninf, int __pinf, int __nan);

where __nmin and __nmax are the bounds of the negative interval, and __nmin > __nmax indicates an empty negative interval. The bounds __pmin and __pmax work similarly for a positive interval. The other parameters are flags indicating the presence of zeros, infinities and NaNs.

TrustInSoft Analyzer++ Specificities
Identifiers
Mangling

The identifiers in a C++ program are mangled to match C identifiers. The mangling scheme used in TrustInSoft Analyzer is a variation of Itanium mangling. The differences are:

  • Class, union and enum names are also mangled, even if this is not required by Itanium. The grammar entry used for these types is _Z<name>. As such, the class:

    struct Foo {
      int x;
    }
    

    is translated as:

    struct _Z3Foo { int x; }
    
  • Local variables and formal parameter names are also mangled, to avoid shadowing extern "C" declarations. The grammar entry used for a local variable is _ZL<unqualified-name>. As such, the local variable bar in:

    int main() {
      int bar x = 2;
    }
    

    is mangled as _ZL3bar. The keyword this is not mangled.

  • The virtual method table and the typeinfo structure for a class Foo are mangled as extra static fields named __tis_class_vmt and __tis_typeinfo in this class. As such, the class:

    struct Foo {
      virtual void f() {}
    };
    

    leads to the generation of two variables with mangled names _ZN3Foo15__tis_class_vmtE and _ZN3Foo14__tis_typeinfoE.

Demangling

To make reading the identifiers easier, TrustInSoft Analyzer displays by default a demangled version of the identifier. In the GUI, the mangled name can be obtained by right-clicking on an identifier and select Copy mangled name.

Signatures are ignored when demangling function names. As such, the assignment in:

void func(int) {}

void
test()
{
    void (*ptr)(int) = &func;
}

is displayed as:

void (*ptr)(int);
ptr = & func;

even if the mangled name of func is _Z4funci. This can lead to ambiguity when there are multiple overloads for the named function. A solution to solve it is to look at its mangled name.

Constructors and destructors are demangled as Ctor and Dtor. If the constructor or destructor is a constructor for a base class and is different from the constructor for the most derived object, the suffix Base is added. If the constructor is a copy constructor, the suffix C is added. If the constructor is a move constructor, the suffix M is added. Therefore, the demangled name Foo::CtorC stands for the copy constructor of the class Foo. If the destructor is virtual, it will be demangled as DeletingDtor.

The option -cxx-filt can be used to print the demangled version of an identifier, as demangled by the analyzer. If the identifier is a function name its signature will also be printed. For example, the command tis-analyzer++ -cxx-filt _Z3fooii displays {foo(int, int)}.

When displayed, function return types are preceded by a -> symbol and are displayed after the formal parameter types. For example, the instance of the function show in the following code:

struct Foo {
    void f(int) {}
};

template <typename T>
void show(const T&) {}

int
main()
{
    show(&Foo::f);
}

is printed as show<{(int) -> void} Foo::*>.

Template parameter packs are printed enclosed by [ and ]. As such, the command tis-analyzer++ -cxx-filt _Z1fIJ3Foo3FooEEvDpRKT_ displays {f<[Foo, Foo]>(const [Foo, Foo]&) -> void}: f is a function templated by a parameter pack, which is instantiated with Foo, Foo. Note also that in this case the const and & are applied to the whole pack.

Names displayed in the GUI can be prefixed by .... These names are shortened versions of qualified names. Clicking on this prefix will display the full mangled or demangled name, depending on the command line options.

Functions and methods
Argument passing

When calling a function, TrustInSoft Analyzer uses different transformations to initialize the function’s arguments depending on the type of the argument. These transformations match Itanium calling convention.

Scalar types

Scalar types are kept as is.

Reference types

Reference types are translated as pointers to the referenced types. The initialization of an argument of reference type is translated as taking the address of the initializer. If this initialization requires the materialization of a temporary object, this step is done by the caller. For example, with the following original source code:

void f(int &&, int &);

void g() {
    int x;
    f(2, x);
}

the translated declaration for the function f is void f(int *a, int *b) and the call to f is translated as:

int x;
int __tis_temporary_0;
__tis_temporary_0 = 2;
f(& __tis_temporary_0,& x);
Class types

The passing of a class type depends on whether the class is non-trivial for the purposes of calls. A class type is non-trivial for the purpose of call if:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

If the type is non-trivial for the purposes of calls, a variable of the class type is defined in the caller and the function receives a pointer to this variable. Such variables are named __tis_arg_##. For example, in the following code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

void f(Obj x, Obj y);

void g() {
    f( {}, {} );
}

the translated function f has the signature:

void f(struct Obj *x, struct Obj *y);

and its call is translated as:

struct Obj __tis_arg;
struct Obj __tis_arg_0;
{
  Obj::Ctor(& __tis_arg_0);
  Obj::Ctor(& __tis_arg);
}
f(& __tis_arg,& __tis_arg_0);

If the function returns a class that is non-trivial for the purposes of calls, then it is translated as a function returning void but with an additional argument. This argument is a pointer to a variable in the caller that will receive the function return. If the caller does not use the function return to initialize a variable, a variable named __tis_cxx_returnarg_## is created for this purpose.

For example, with the following original source code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

Obj f();

void g() {
    Obj o = f();
    f();
}

the translated function f has the signature:

void f(struct Obj *__tis_cxx_return)

and the body of the function g is translated as:

struct Obj o;
f(& o);
{
  struct Obj __tis_cxx_returnarg;
  f(& __tis_cxx_returnarg);
}
return;

If the type is trivial for the purposes of calls, no transformation is applied and the object is passed by copying its value. For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f(Obj o);

void g() {
    f( {} );
}

the signature of the translated function f is

struct Obj f(struct Obj o)
Unknown passing style

Sometimes, TrustInSoft Analyzer cannot decide if a class is trivial for the purposes of calls in a translation unit. In such cases, it will assume that the type is non-trivial for the purposes of calls and emit a warning like:

[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.

If the user knows that the type is trivial for the purpose of calls, he can use the option -cxx-pass-by-value to force this.

For example, with the following original source code:

struct Foo;

void f(Foo x);
  • with no particular option set, TrustInSoft Analyzer will produce the following warning and declaration for f:

    [cxx] warning: Unknown passing style for type 'Foo'; assuming
    non-trivial for the purpose of calls. Use the option
    '-cxx-pass-by-value _Z3Foo' to force the opposite.
    
    void f(struct Foo *x);
    
  • with the option -cxx-pass-by-value _Z3Foo, TrustInSoft Analyzer will produce the following declaration for f without warning:

    void f(struct Foo x);
    

Using an incorrect passing style can lead to errors like:

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct _Z3Foo * vs. struct Foo
                   First declaration was at file1.cpp:7
                   Current declaration is at file2.c:7

or

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct Foo vs. void
                   First declaration was at file.c:7
                   Current declaration is at file.cpp:7
Method transformations

Methods do not exist in C, and are translated as functions by TrustInSoft Analyzer++. The following additional transformations are applied to non-static methods:

  • the name of the function is the qualified name of the method.
  • if the method is a non-static method, the function gets an additional this argument. Its type is a pointer to the class enclosing the method. There is an additional const qualifier if the method is const-qualified.
  • if the method is a non-static method, the this argument is initialized with the address of the calling object.

For example, with the following original source code:

struct Obj {
    Obj();
    static void bar(int x);
    void foo(int x) const;
};

void
f(void)
{
    Obj o;
    o.foo(1);
    Obj::bar(0);
}

two function declarations are produced:

void Obj::bar(int x);
void Obj::foo(const struct Obj *this, int x);

and the calls to foo and bar are translated as:

Obj::foo(& o,1);
Obj::bar(0);
Constructor elision

By default, constructor elision is enabled and TrustInSoft Analyzer++ will omit some calls to copy or move constructors to temporary objects, as allowed by C++ standards from C++11 onwards.

Constructor elision can be disabled with the -no-cxx-elide-constructors option.

For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f();

void g() {
    Obj y = f();
}

when constructor elision is enabled, the call to f is translated as:

f(& y);

However, when constructor elision is disabled with the option -no-cxx-elide-constructors, it is translated as:

struct Obj __tis_temporary_0;
f(& __tis_temporary_0);
Obj::CtorM(& y,& __tis_temporary_0);

In this case, the result of the call to f is written to the temporary object __tis_temporary_0 and this temporary object is then moved to the initialized variable y.

Objects memory layout

TrustInSoft Analyzer uses its own memory layout to represent C++ objects. In order to preserve as much useful information as possible, the analyzer defines multiple well-typed data structures, and uses more than one extra pointer field in polymorphic classes. As a result of this choice, the numeric value of sizeof(Class) will differ between the compiled code and the analyzed code.

Objects, being class or struct, are translated as C structures. union are translated as C unions.

The inline declaration of a static field is translated as a declaration of a global variable with the same qualified name. The out-of-line definition of a static field is translated as a definition of a global variable with the same qualified name.

Non-static fields are translated as fields in the translated structure. The fields are emitted in the source code order.

Empty classes

Empty classes are translated as a structure with one field char __tis_empty;. This enforces that the size of an empty class is not zero.

Non-virtual inheritance

Non-virtual non-empty base classes are translated as fields in the derived class. Such fields are named __parent__ followed by the name of the base class.

For example, with the following original source code:

class Foo {
    int x;
};

struct Bar: Foo {
    int y;
    int z;
};

the structures produced for the class Foo and Bar will be:

struct Foo {
   int x ;
};

struct Bar {
   struct Foo __parent__Foo ;
   int y ;
   int z ;
};

Non-virtual empty base classes do not appear in the translated C structure. For example, with the following original source code:

class Foo { };

struct Bar: Foo {
    int y;
    int z;
};

the structure produced for the class Bar is:

struct Bar {
   int y ;
   int z ;
};

In this case, a reference to the base Foo of an object of type Bar binds to the original object. In other words, the assertion in the following program is valid in the model used by TrustInSoft Analyzer:

class Foo {};

struct Bar: Foo {
    int y;
    int z;
};

int
main()
{
    Bar b;
    Foo &f = b;
    void *addr_b = static_cast<void *>(&b);
    void *addr_f = static_cast<void *>(&f);
    //@ assert addr_b == addr_f;
}
Polymorphic classes

If a C++ class is polymorphic, its corresponding C structure contains two additional fields:

  • struct __tis_typeinfo const *__tis_typeinfo; holding a pointer to the type_info of the most derived object of the current object.
  • struct __tis_vmt_entry const *__tis_pvmt; holding a pointer to the virtual method table of the current object.

As an example, the class:

struct Foo {
    int x;
    virtual void f() {}
};

is translated as:

struct Foo {
   struct __tis_typeinfo const *__tis_typeinfo ;
   struct __tis_vmt_entry const *__tis_pvmt ;
   int x ;
};

These additional fields are set by the constructors of the polymorphic class.

Virtual inheritance

If a class has a virtual base, its translation produces two different C structures: the regular C structure as well as a base version of the class.

The regular structure is used when the object is the most derived object. In this case:

  • the structure gets an additional field long const * __tis_vbases_ptr;. This is an array holding the offset of each virtual base of the object.
  • all virtual bases of the class are translated as fields in the C structures but their name is prefixed by __tis_vbases_ to distinguish them from non-virtual bases.

The base version of the object has its name prefixed by __vparent__ and is used when the object is used as a base for another object. In this case:

  • the structure gets an additional pointer __tis_vbases_ptr of type long const *. This is an array to the offset of each virtual base of the object.
  • the structure does not contain fields related to the virtual base classes.

As an example the following class:

struct Baz: Bar, virtual Foo {
    int z;
};

produces the two classes:

struct Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
   struct Foo __vparent__Foo ;
};

struct __tis_base_Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
};

Accessing a virtual base is always done by shifting the address of the current object with the offset of the virtual base in the __tis_vbases_ptr array.

As an example, with the following code:

struct Foo {
    int x;
};

struct Bar: virtual Foo {
    int y;
};

int
main()
{
    Bar bar;
    Foo &foo = bar;
}

the body of the main function is translated as:

int __retres;
struct Baz baz;
struct Foo *foo;
Baz::Ctor(& baz);
foo = (struct Foo *)((char *)(& baz) + *(baz.__tis_vbases_ptr + 0));
__retres = 0;
return __retres;

The virtual base Foo of a class Baz has index 0, so the offset to use to go from Baz to Foo is *(baz.__tis_vbases_ptr + 0)

Layout summary

The full layout for objects is the following, in increasing address order:

  • For most-derived classes and classes with no virtual base:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields
    • Virtual bases
  • For classes with virtual bases used as base classes:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields
Member pointers
Pointers to member functions declaration

Pointers to a method X Foo::f(A1, A2, ..., An) are translated as a C structure with the following fields:

unsigned long vmt_index ;
X (* __attribute__((__tis_sound_cast__)) ptr)(struct Foo *, A1, A2, ..., An) ;
long shift ;
size_t vmt_shift ;

If Foo::f is a non-virtual method, then:

  • the field ptr is a pointer to the method called when resolving the symbol f in the scope of Foo. This can be the method Foo::f if f is declared in Foo or a method of one of the parent classes of Foo.
  • the field shift is the offset of the base containing the method f. If f is in Foo, then this is 0, otherwise it is the offset of the parent class declaring f.
  • the field vmt_index is 0.

If Foo::f is a virtual method, then:

  • the field ptr is the same as if Foo::f was a non-virtual method.
  • the field shift is the same as if Foo::f was a non-virtual method.
  • the field vmt_index is 1 + the index of the method in the virtual method table of the class containing the final override of f in Foo. This can be different from the index of f in the virtual method table of Foo if the final override of f is declared in a parent class of Foo.

Each pointer to member function type produce a different structure type. The structure type is named __tis_XXXX, where XXXX is the mangled name of the method pointer type.

For example, with the classes:

struct Pack {
    char c[1000];
};

struct Bar {
    int y;
    int f() { return 2; }
};

struct Foo: Pack, Bar {
    virtual void g() {}
};

the following statements:

int (Foo::*x)(void) = &Foo::f;
void (Foo::*y)(void) = &Foo::g;

are translated as:

struct __tis_M3FooFivE x;
struct __tis_M3FooFvvE y;
x.vmt_index = 0UL;
x.ptr = (int (*)(struct Foo *))(& Bar::f);
x.shift = 0L - (long)((struct Foo *)((unsigned long)0 - (unsigned long)(& ((struct Foo *)0)->__parent__Bar)));
x.vmt_shift = 0UL;
y.vmt_index = 2UL;
y.ptr = (void (*)(struct Foo *))(& Foo::g);
y.shift = 0L;
y.vmt_shift = ((unsigned long)(& ((struct Foo *)0)->__tis_pvmt);
Program initialization
Dynamic initialization

If a variable v is initialized at dynamic initialization time, it is translated as:

  • a declaration for the variable v.
  • a function void __tis_init_v(). The content of this function is the translation of the initializer of v.

All __tis_init_XXX functions are called by a special function __tis_globinit. The __tis_globinit function is in turn called at the beginning of the main function.

As an example, the program:

int id(int x) { return x; }

int x = id(12);

int
main()
{
    return x;
}

is translated as:

int x;
void __tis_init_x(void)
{
  x = id(12);
  return;
}

 __attribute__((__tis_throw__)) int id(int x);
int id(int x)
{
  return x;
}

 __attribute__((__tis_throw__)) int main(void);
int main(void)
{
  __tis_globinit();
  return x;
}

void __tis_globinit(void)
{
  __tis_init_x();
  return;
}
Static initialization

A variable with a constant initializer is translated as a C variable with an initializer. The initializer is the value of the C++ constant initializer. As an example, with the following code:

constexpr
int
add_one(int x)
{
    return x + 1;
}

const int x = add_one(2);

the definition of the variable x is translated as:

static int const x = 3;

In some special circumstances, one may need to disable static initialization semantics described by the C++ standard. It can be done using the option -no-cxx-evaluate-constexpr. In this case, whenever a variable is initialized with a constant initializer that is not a constant initializer according to the C rules, the initialization of this variable is done at dynamic initialization time and uses the initializer as it was written by the user.

Using this option can lead to unsound results. As an example, with the following program:

constexpr int id(int x) { return x; }

extern const int x;

const int y = x;

const int x = id(1);

int
main()
{
    int a = y;
    int b = x;
    //@ assert a == b;
    return a == b;
}
  • the assertion is valid from the C++ standards point of view.
  • the assertion is valid using the command-line tis-analyzer++ --interpreter test.cpp.
  • the assertion is invalid using the command-line tis-analyzer++ --interpreter test.cpp -no-cxx-evaluate-constexpr.
Static local variables

A static local variable x is translated as a triple of:

  • a global variable, corresponding to the translated static local variable. The type of the variable is the translated type of the static variable, and the name of the variable its prefixed by its enclosing function name. This variable is 0-initialized.
  • a global variable __tis_guard_x.
  • a check on __tis_guard_x ensuring the initialization of the static variable is not recursive, followed by a conditional block doing the initialization of the variable once.

The variable __tis_guard_x can have the following values:

  • -1: the variable x has not been initialized yet.
  • 0: the variable x is being initialized.
  • 1: the variable x has been initialized.

As an example, the following function:

int
main()
{
    static Foo f;
    return 0;
}

is translated as:

int main::__tis_guard_f;
struct Foo main::f = {.x = 0};

int main(void)
{
  int __retres;
  tis_ub("Recursive initialization of the static local variable f.",
       main::__tis_guard_f != 1);
  if (! main::__tis_guard_f) {
    main::__tis_guard_f ++;
    Foo::Ctor(& main::f);
    main::__tis_guard_f ++;
  }
  __retres = 0;
  return __retres;
}
Special variable names

TrustInSoft Analyzer++ introduces several special variables while translating code. This section sumarizes the different name families used for these variables.

Global variables
  • __tis_ABI::exc_stack_depth: how many exceptions are currently raised.
  • __tis_ABI::exc_stack: all exceptions currently raised.
  • __tis_ABI::caught_stack_depth: how many exceptions are currently caught.
  • __tis_ABI::caught_stack: all exceptions currently caught.
  • __tis_unwinding: whether the program is currently unwinding its stack
  • XXX::__tis_class_vmt: virtual method table for an object of type XXX used as most derived object.
  • XXX::__tis_class_typeinfo: typeinfo for an object with a most derived object being of type XXX
  • XXX::__tis_class_inheritance: inheritance information for an object with most derived object being of type XXX
Intermediate variables
  • __Ctor_guard: guard used to check if the lifetime of an object has started.
  • __tis_alloc: materialization of a space reserved by an allocation function.
  • __tis_arg: materialization of a function argument.
  • __tis_bind: temporary variable used to initialize non-reference structured bindings of arrays.
  • __tis_cast: result of a dynamic_cast.
  • __tis_const: materialization of an function argument that is non-trivial for the purpose of calls.
  • __tis_compound_literal: materialization of a C++ temporary used to initialize a compound literal.
  • __tis_constant_expression: materialization of a C++ constant expression.
  • __tis_cxx_return_arg: materialization of the discarded result of a call that is non-trivial for the purpose of calls.
  • virtual_dtor_tmp: virtual method table cell used when calling a virtual destructor.
  • __tis_deleted_value: address of a deleted value.
  • __tis_dereference: temporary variable used to dereference a member pointer.
  • __tis_dyncast: operand of a dynamic_cast.
  • __tis_exn: materialization of an object of type std::bad_XXX being thrown.
  • __tis_gnu_ternary: shared computation in a GNU ternary expression.
  • __tis_guard: guard controlling the initialization of a static local variable.
  • __tis_implicit_value: materialization of a C++ temporary used to perform an implicit value initialization.
  • __tis_index: index used when destroying an array when its lifetime finishes.
  • __tis_initializer_list: materialization of a C++ temporary used to build an initializer list.
  • __tis_init_size: loop variable used to initialize arrays.
  • __tis_lambda_temp: materialization of a C++ temporary variable in a lambda.
  • __tis_lvalue: materialization of a C++ lvalue that is not an lvalue in C.
  • __tis_mmp: method pointer being called.
  • __tis_mmp_init: temporary variable used to initialize a method pointer.
  • __tis_object_cast: temporary variable used to compute object inheritance casts.
  • __tis_offset: index used to destroy array elements as a consequence of calling delete[].
  • __tis_placement: address of an object initialized by a placement new.
  • __tis_relop: intermediate result when translation relational operators.
  • __tis_temp: materialization of a C++ temporary variable.
  • __tis_typeid: address of an object with a polymorphic typeid being computed.
  • __virtual_this: this pointer computed when calling a virtual method.
  • __virtual_tmp: virtual method table cell used when calling a virtual method. The name of the called function is added to the end of the name of the temporary
Generated contracts

Function contracts will be automatically generated for non-static class methods, in order to require validity of the this pointer as a precondition. Copy and move constructors will also grow separated annotations since they are expected to operate on separate objects.

These contracts will be added to user-provided contracts, if any.

Computation of these annotations can be disabled with the -no-cxx-generate-contracts option.

Choosing the dialect for the analyzed program
Option enum-restriction

In C++ standard, enumerations can have trap representations, specifically, the C++ standard allows an enumerated type to behave as if the underlying representation type is an integer type of minimum number of bits needed to represent all the enumeration constructors: unsigned if all constructors are not less than 0, and signed if at least one constructor is less than 0. Consider the following code (test.cpp):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
enum e { ZERO, ONE, TWO};
int r;
int main(void) {
   enum e e1 = (enum e) 3;
   enum e e2 = (enum e) 4;
   switch(e2){
   case 0: r = 0; break;
   case 1: r = 1; break;
   case 2: r = 2; break;
   case 3: r = 3; break;
   case 4: r = 4; break;
   }
   std::printf("r = %d", r);
}

The underlying type representation of enum e is an unsigned integer type of two bits, that is, according to C++ standard, variable e1 can only has values between 0 and 3, thus, the statement at line 7 that casts 4 to a value of type enum e and assigns it to variable e2 is undefined behavior, while the cast from 3 to enum e at line 6 is allowed according to the C++ standard. When compiling the program with:

$ g++ test.cpp

variable r is set to 4 in the end of the program. However, when the program is compiled with:

$ g++ -fstrict-enums test.cpp

the value of r is 0 by the end of the program. The option -fstrict-enums allows the compiler to optimize under the assumption that a value of an enumerated type can only be one of the values of the enumeration as defined in the C++ standard. This assumption is not satisfied due to the cast at line 6.

In C++11, an enumeration can also be declared with a specific underlying representation type. In this case, the enumeration type behaves as if the specified type. For example, when declaring the type enum e as enum e : char { ZERO, ONE, TWO}; in the example above, the program has no undefined behavior.

In order to make sure an enumeration always has a value allowed, TrustInSoft Analyzer provides the option -enum-restriction with three option values: loose, strict and stricter that respectively corresponds to the C standard, the C++ standard, and a stricter assumption that requires a enumeration value only being values in the range of the minimum and maximum enumeration constructor.

Consider the following code:

1
2
3
4
5
6
7
enum E { X = 1, Y = 13} x;

int main(void){
   x = (enum E)2;
   x = (enum E)14;
   x = (enum E)16;
}
  • The default option value loose corresponds to the enumeration behaviors in C. It means that any value representable in the underlying integer type of an enumeration is allowed, although the choice of underlying integer type to represent an enumeration is implementation-defined. TrustInSoft Analyzer uses the same algorithm as GCC for choosing the underlying type of an enumeration based on the values of the enumeration members. Thus, the program will be successfully analyzed by the analyzer and no alarm will be raised when -enum-restriction loose is set.

  • The option value strict corresponds to the C++ standard, that is, any value of the narrowest integer type that can represent all the enumeration constructors is allowed. For example, the enum E contains only the value 1 and 13, which can be represented by an unsigned integer of 4 bits, that is, only values between 0 and 15 are allowed. Therefore, an alarm will be raise at the statement on line 6:

    $ tis-analyzer -enum-restriction strict -val enum.c
    [...]
    [value] Analyzing a complete application starting at main
    [value] Computing initial state
    [value] Initial state computed
    [value] The Analysis can be stopped with: kill -USR1 816
    enum.c:6:[kernel] warning: Reading a wrong representation ({16}) for type enum E.
                      stack: main
    enum.c:6:[kernel] warning: Unclassified alarm: assert
                        \warning("representation of value of type enum E is [0..15]");
                      stack: main
    [value] done for function main
    
  • The option value stricter only accepts values in the range of minimum and maximum enumeration constructor. For example, the minimum and maximum constructor of the enum E are 1 and 13 respectively, which means, only values between 0 and 13 are allowed. Therefore, an alarm will be raised at the statement on line 5:

    $ tis-analyzer -enum-restriction stricter -val enum.c
    [...]
    [value] Analyzing a complete application starting at main
    [value] Computing initial state
    [value] Initial state computed
    [value] The Analysis can be stopped with: kill -USR1 6977
    enum.c:5:[kernel] warning: Reading a wrong representation ({14}) for type enum E.
                       stack: main
    enum.c:5:[kernel] warning: Unclassified alarm: assert
                         \warning("representation of value of type enum E is [1..13]");
                      stack: main
    
Option -pointer-arguments-to-std-functions

Some standard library functions, like void *memcpy(void *, const void *, size_t), have a pointer p and a size n as some of their parameters, and provide operations on the first n bytes data of the memory area pointed to by the pointer p. The C standard mandates that p is a pointer inside “the address space of the program” (this is clause 7.1.4:1 in the C17 standard, and the same clause exists in earlier versions of the standard):

  • If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.

The clause 7.22.5:1 re-emphasizes that this applies to comparison functions that take a size, even if the passed size is 0:

  • Where an argument declared as size_t nmemb specifies the length of the array for a function, nmemb can have the value zero on a call to that function; the comparison function is not called, a search finds no matching element, and sorting performs no rearrangement. Pointer arguments on such a call shall still have valid values, as described in 7.1.4.

The clause 7.24.1:1 similarly re-emphasizes that 7.1.4 applies to string manipulation functions “unless explicitly stated otherwise”.

These restrictions are gratuitous for any reasonable modern architecture, where if the program has been able to build a pointer without invoking undefined behavior, passing it to a standard function that does nothing with it is never a problem. However, some compilers have taken to optimizing C programs on the basis of them respecting these clauses, so that the generated binary behaves unexpectedly if the program being compiled does not respect the rules above.

For instance, consider the program below:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

__attribute__((noinline)) //imagine the function is in another compilation unit
void f(char *p, int n) {
    if (n > 10) abort();
    memcpy(p,"0123456789",n);
    if (p == NULL) {
        printf("f was called with NULL\n");
    }
    else {
        printf("f was called with a non-null pointer\n");
        printf("The value of p:%p that is:0x%llx\n", p, (unsigned long long)p);
    }
}

int main(void) {
    f(NULL, 0);
}

GCC 10.2 compiles this C program into a binary that, when executed, displays the following:

f was called with a non-null pointer
The value of p:(nil) that is:0x0

(Compiler Explorer link: https://gcc.godbolt.org/z/jYsKEW)

This is not a GCC bug, and incidentally this behavior has existed for several versions. GCC is “in the right” because the program passes NULL to memcpy.

When this program is analyzed with TrustInSoft Analyzer using the default settings, the analyzer warns that the program passes NULL to memcpy and that this may cause the program to behave differently than expected.

By default, TrustInSoft Analyzer requires the pointer p passed to a standard function to be valid to read or write even when the size n is zero.

Example:

#include <string.h>
int main(){
   char c1, c2;
   char *p = &c1;
   char *q = &c2;
   memcpy(p+1, q+1, 0);
}

Analyzing such a program raises alarms about invalid memory write or read:

int main(void)
{
  int __retres;
  char c1;
  char c2;
  char *p;
  char *q;
  p = & c1;
  q = & c2;
  /*@ assert Value: logic_mem_access: \valid(p+1); */
  /*@ assert Value: logic_mem_access: \valid_read(q+1); */
  memcpy((void *)(p + 1),(void const *)(q + 1),(size_t)0);
  __retres = 0;
  __tis_globfini();
  return __retres;
}

If you know that your C compiler will not play such tricks on you despite being allowed to by the standard (hopefully your C compiler explicitly documents that it won’t play this trick, otherwise it only hasn’t gotten round to it yet), the option -pointer-arguments-to-std-functions can be used. The option -pointer-arguments-to-std-functions accepts three values: strict, nonnull, or loose.

  • The option value strict, which is the default, means that only pointers valid for reading or writing are allowed as arguments for standard library functions.

  • The option value nonnull means that a null pointer is not allowed, but any other pointer that is valid to compute is allowed when size n is zero. This includes “one-past” pointers.

    Example:

    /* run.config
       OPT: @TIS_CMD@ -pointer-arguments-to-std-functions nonnull
    */
    #include <string.h>
    int main(){
        char c1, c2;
        char *p = &c1;
        char *q = &c2;
        memcpy(p+1, q+1, 0);
        memcpy(NULL, q, 0);
    }
    

    Analyzing the code with -pointer-arguments-to-std-functions nonnull only raises alarms about null pointers:

    int main(void)
    {
      int __retres;
      char c1;
      char c2;
      char *p;
      char *q;
      p = & c1;
      q = & c2;
      memcpy((void *)(p + 1),(void const *)(q + 1),(size_t)0);
      /*@ assert Value: unclassified: \warning("(void *)0 != (void*) 0"); */
      memcpy((void *)0,(void const *)q,(size_t)0);
      __retres = 0;
      __tis_globfini();
      return __retres;
    }
    
  • The option loose means that the analyzer will not check the validity of pointer arguments when size n is zero, thus no alarms will be raised about invalid pointer parameters when size n is zero.

Deterministic libc

When the TIS_DETERMINISTIC_LIBC macro is set (use -cpp-extra-args=”-DTIS_DETERMINISTIC_LIBC”), TrustInSoft Analyzer provides deterministic implementations of some libc and POSIX functions. This can prove useful when analyzing the behavior of a program on concrete values, for instance in interpreter mode, to keep the analyses deterministic.

Endianness

POSIX.1-2001 defines four functions to convert between packed network bytes (big endian) and proper integer values on the host:

  • htonl(): host to network, 32 bits (long)
  • htons(): host to network, 16 bits (short)
  • ntohl(): network to host, 32 bits (long)
  • ntohs(): network to host, 16 bits (short)

TrustInSoft Analyzer assumes that the hosts uses a common representation of integer (byte-wise little or big endian), which means that htonl() is equivalent to ntohl() and that htons() is equivalent to ntohs().

This is generalized with the POSIX.1-202x header endian.h, which supports converting between host integers values and either little (“le”) or big endian (“be”).

#include <arpa/inet.h>

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

#include <endian.h>

uint16_t htobe16(uint16_t host_16bits);
uint16_t htole16(uint16_t host_16bits);
uint16_t be16toh(uint16_t big_endian_16bits);
uint16_t le16toh(uint16_t little_endian_16bits);

uint32_t htobe32(uint32_t host_32bits);
uint32_t htole32(uint32_t host_32bits);
uint32_t be32toh(uint32_t big_endian_32bits);
uint32_t le32toh(uint32_t little_endian_32bits);

uint64_t htobe64(uint64_t host_64bits);
uint64_t htole64(uint64_t host_64bits);
uint64_t be64toh(uint64_t big_endian_64bits);
uint64_t le64toh(uint64_t little_endian_64bits);

The user may also provide their own implementation for any of these functions, which will override the one from TrustInSoft Analyzer.

Note

  • htonl(), htons(), ntohl() and ntohs() are equivalent to htobe32(), htobe16(), be32toh() and be16toh() respectively
Random

The C89 standard defines rand(), which returns a pseudo-random number, and srand(), which sets the random seed to a user-provided value.

POSIX.1-2001 says that the period of rand() should be a value of at least 232 but we assume a weaker generator for stronger guarantees (e.g. higher chance of collision). TrustInSoft’s implementation of rand() and srand() is the one given in C89 with 32768 for INT_MAX.

The POSIX standard adds the reentrant function rand_r() where a pointer to the seed is passed as an argument and modified.

The POSIX functions random(), srandom(), initstate() and setstate() give much stronger guarantees and refer to a specific generator. The standard talks of a “non-linear additive feedback random-number generator”, which actually refers to a Lagged Fibonacci generator.

The user may also provide their own implementation for any of these functions, which will override the one from TrustInSoft Analyzer.

#include <stdlib.h>

int rand(void);
void srand(unsigned int seed);
int rand_r(unsigned *seedp);

char *initstate(unsigned int seed, char *state, size_t n);
long random(void);
char *setstate(char *state);
void srandom(unsigned int seed);
Time

The C functions time() (defined in C89) and gettimeofday() (defined in POSIX.1-2001) provide information about the current time. When using these functions in the analyzer with the TIS_DETERMINISTIC_LIBC macro defined, time starts at TIS_INIT_TIME_SEC seconds [1] and TIS_INIT_TIME_USEC microseconds after the Unix epoch [2]. Every time either time() or gettimeofday() is called, the clock progresses forward by TIS_INCR_TIME_SEC seconds and TIS_INCR_TIME_USEC microseconds. The values of these macros can be overridden with the -cpp-extra-args option:

  • TIS_INIT_TIME_SEC can be any valid value for long int;
  • TIS_INIT_TIME_USEC can be any integer between 0 and 999999 (included);
  • TIS_INCR_TIME_SEC should be a non-negative valid value for long int;
  • TIS_INCR_TIME_USEC should be a non-negative valid value for long int (can be larger than 999999).

Warning

Calling time() or gettimeofday() twice in quick succession, or calling one right after another will still show a difference of TIS_INCR_TIME_SEC seconds and TIS_INCR_TIME_USEC microseconds.

The user may also provide their own implementation for either of these functions, which will override the one from TrustInSoft Analyzer.

#define TIS_INIT_TIME_SEC 1596453215L // 2020-08-03T13:13:35+02:00
#define TIS_INIT_TIME_USEC 306099L
#define TIS_INCR_TIME_SEC 1L
#define TIS_INCR_TIME_USEC 0L

#include <time.h>

time_t time(time_t* timer);

#include <sys/time.h>

struct timeval {
    time_t      tv_sec;     /* seconds */
    suseconds_t tv_usec;    /* microseconds */
};

struct timezone {
    int tz_minuteswest;     /* minutes west of Greenwich */
    int tz_dsttime;         /* type of DST correction */
};

int gettimeofday(struct timeval *tv, struct timezone *tz);

Note

  • gettimeofday() is marked as obsolete by POSIX.1-2008.
  • According to POSIX.1-2001, the behavior of gettimeofday() is not specified when passing a non-NULL value for the tz argument (TrustInSoft Analyzer zeroes the pointed structure).
[1]ignoring leap seconds
[2]1 January 1970 at midnight UTC (1970-01-01T00:00:00Z)

Verification of properties set at the design stage

This section gives some ideas about how verify properties that come from the design stage of the application:

Checking the dependencies
Principle

The dependency analysis’ results are useful to confirm that C code that has been written modifies the variables expected by the design documents, and computes the new values of these variables from the inputs expected by the design documents.

The dependency analysis runs on top of an already-configured value analysis. The dependencies shown are valid for the inputs of the target function described by the analysis driver.

The dependency analysis is activated by adding the options -graphdeps -deps to a command line that already launches a value analysis.

Direct dependencies

Consider the function f, for which an analysis driver has already been written to perform a value analysis:

#include <tis_builtin.h>

int i, o;

void f(void) {
  o = 2 * i;
}

// Analysis driver:
int main(void) {
  // set up the inputs:
  i = tis_interval(-100, 100);

  // call the target function
  f();
}

The command line to obtain the dependencies in the log is:

$ tis-analyzer -val -slevel 10 -graphdeps -deps direct.c

Inside the function f, the value of the variable i is used in arithmetic operations to compute the value of the output variable o. This is shown as:

...
[from] Function f:
  o FROM direct: i
...
Verifying that the function is deterministic

One way for a C function to not behave in a deterministic manner is to invoke Undefined Behavior. In this case, the code generated by the C compiler may behave erratically, and there are no guarantees about what can happen when the function is called. The value analysis is used to guarantee that Undefined Behavior does not happen for any of the inputs delimited by the analysis driver.

Beyond this basic guarantee, when the dependency analysis says that the above function f computes o FROM direct: i, it means that every time the function f is called, it places in the variable o a value that is computed from nothing else than the value of the variable i at the time of the call. If the function is called twice with the same value in i, then the same value will be put in o. Every other variable of the program could have a different value and the function f would still compute the same value in o.

If the developer had mistakenly written f in a way such that it had an internal state, and behaved correctly the first time it is called, but not the second one, this internal state would appear in the dependencies computed by the dependency analysis. (Here “mistakenly” means “contrary to the specification documents written during the design phase”. There is nothing wrong with some functions having an internal state, as long as they were designed so.)

Similarly, if the developer had mistakenly made f’s behavior depend on some configuration option, this would appear in the computed dependencies.

While trying out the example above, you may have noticed that TrustInSoft Analyzer says that the function main sets the variables i and o, but that the values to which these variables are set depend on a mysterious tis_entropy_source variable:

...
[from] Function main:
  tis_entropy_source FROM direct: tis_entropy_source (and SELF)
  i FROM direct: tis_entropy_source
  o FROM direct: tis_entropy_source
  \result FROM \nothing
...

Generally speaking, it is not interesting to look at the results of the dependency analysis for analysis drivers, such as main here. But to explain what is shown here, the function main sets i to an arbitrary value between -100 and 100, by calling tis_interval(-100, 100), so that the analysis results will be valid for this range of inputs. It would be an unwanted assumption to assume that the function tis_interval always produces the same value when applied to the same input bounds. Instead, the function tis_interval is modeled as having an internal state tis_entropy_source, that it updates at every call, and that the result is computed from.

In the function main, the variable i receives the result of a call to tis_interval, so for the dependency analysis, it sets i to a value that depends on tis_entropy_source. The function main also calls f, which modifies o. Looking at the function f, the new value of o is computed deterministically from the value of i, but looking at main, the value of o is set to a value that depends on tis_entropy_source, since main calls f with a value in i that has been set to something depending on tis_entropy_source.

Indirect dependencies
#include <tis_builtin.h>

int i, o;

void f(int c, int *p) {
  if (c)
    o = 2 * (*p);
  else
    o = 0;
}

// Analysis driver:
int main(void) {
  // set up the inputs:
  i = tis_interval(-100, 100);
  int c = tis_interval(0, 1);

  // call the target function
  f(c, &i);
}
$ tis-analyzer -val -slevel 10 -graphdeps -deps indirect.c
...
[from] Function f:
  o FROM indirect: c; p; direct: i
...

In this example, o still depend on i, but it does through p (which points on i) and only if c is different from 0. In this case, p and c are said to be indirect dependencies because their values are involved in the value assigned to o, but the value of o is not computed directly from them.

Often, whether a dependency towards an input is direct or indirect is a matter of programming style. It is possible to tell TrustInSoft Analyzer to display the results of the dependency analysis without distinguishing direct and indirect dependencies with the option -no-show-indirect-deps. On the above example, using this option gives:

$ tis-analyzer -val -slevel 10 -graphdeps -deps indirect.c -no-show-indirect-deps
...
[from] Function f:
  o FROM i; c; p
...
SELF
#include <tis_builtin.h>

int x, y, z;

void f(int c) {
  if (c) x = y + z;
}

int main(void) {
  y = tis_interval(0, 100);
  z = tis_interval(-50, 50);
  int c = tis_interval(0, 1);
  f(c);
}
$  tis-analyzer -val -slevel 10 -graphdeps -deps self.c
...
[from] Function f:
  x FROM indirect: c; direct: y; z (and SELF)
...

In this example, x has direct and indirect dependencies as before, but it is also said to have a (and SELF) dependency. This means that x may not be updated when executing f. In this example, this happens when the input value of c is 0.

Final example
#include <tis_builtin.h>

void do_peak_things (void);
void do_offpeak_things (void);
enum period { PEAK, OFFPEAK, NBPERIOD } current_period;
unsigned total;
unsigned total_during_period[NBPERIOD];
unsigned previous_t;

void update_totals(unsigned t, unsigned instant_pow) {
  unsigned duration = t - previous_t;
  unsigned energy = instant_pow * duration;
  total = total + energy;
  switch (current_period) {
  case PEAK:
    do_peak_things();
    break;
  case OFFPEAK:
    do_offpeak_things();
    break;
  }
  total_during_period[current_period] = total_during_period[current_period] + energy;
  previous_t = t;
}

// Analysis driver
int main(void) {
  // set up the inputs:
  tis_make_unknown(&previous_t, sizeof previous_t);
  current_period = tis_interval(0, NBPERIOD - 1);
  unsigned t;
  tis_make_unknown(&t, sizeof t);
  unsigned instant_pow;
  tis_make_unknown(&instant_pow, sizeof instant_pow);

  // call the target function
  update_totals(t, instant_pow);
}

The command line to obtain the dependencies in the log is still:

$ tis-analyzer -val -slevel 10 -graphdeps -deps update_totals.c

Notice that, since the dependency analysis is based on the results of the value analysis, options that make the value analysis more precise, such as -slevel, can also result in more precise dependency results.

The relevant results are:

...
[from] Function update_totals:
  total FROM direct: total; previous_t; t; instant_pow
  total_during_period[0]
                     FROM indirect: current_period; direct: total_during_period[0];
                           previous_t; t; instant_pow (and SELF)
                     [1]
                     FROM indirect: current_period; direct: total_during_period[1];
                           previous_t; t; instant_pow (and SELF)
  previous_t FROM direct: t
...
Proving functional properties

When implementing an application, one may wish to check that some functional properties from the specification are indeed satisfied by the code.

The ACSL Properties chapter already explained that ACSL properties may be added to a project, and Prove Annotations with WP shows how the WP plug-in may be used to prove them.

Partial specifications

Formal verification does not have to be complete to be interesting. The Checking the dependencies chapter already showed how input/output specifications can be checked, even without using WP. But functional properties may also be checked even if they represent partial specifications.

For instance, the partial specification of a function may just be:

//@ ensures f_ex: x >= 0 ==> \result >= 0;
int f (int x);

It does not say what the result is, only that it is positive when the input x is positive, and it says nothing about the case where the input is negative. It may require to add some properties to the function that are called by f, but only those that are strictly needed to prove the property.

Partial specifications may be used to check a specific goal. For instance, in our PolarSSL verification kit, we only wrote the specifications of the subcomponents that were necessary in order to prove that there were no Undefined Behaviors in the system as a whole.

Complete specifications

Proving total specifications requires more work. The properties must then describe the complete behavior of each function.

The ACSL by Example case study is a nice example of a collection of C functions and data types whose behavior has been formally specified with ACSL and formally verified with WP.

TrustInSoft Analyzer GUI Manual

This manual aim is to explain how to use the TrustInSoft Analyzer GUI.

Starting the GUI

Using the GUI is very similar to running the analyzer in batch mode as explained in TrustInSoft Analyzer Manual.

To launch the GUI, TrustInSoft Analyzer must be called as usual by adding the -gui option. For the TrustInSoft Kernel, the right option is -server:

$ tis-analyzer -gui <options..> <source and preprocessed files..>

or:

$ tis-kernel <options..> <source and preprocessed files..> -server

When the message "Ready <date> port 8080" appears, the GUI can be used by opening a browser at the address localhost:8080. The server port can be chosen with the option -server-port, by default the port is 8080. See -server-help for more information about options for this GUI.

Only one session of a GUI is allowed on a port. However, multiple TrustInSoft Analyzer instances can be launched with different port and opened with different browser pages at the same time.

The supported browsers are:

  • Chromium
  • Google Chrome
  • Firefox
  • Safari
Accessing the GUI through network

By default, the TrustInSoft Analyzer GUI can only be accessed through a web browser running on the host running the server. There are two options to allow accesses from other hosts:

  • with the option -server-expose, then the GUI will be available at the address http://<ip_address>:<port> where <ip_address> is an IP address of the computer where TrustInSoft Analyzer is running and port the port used for the wanted TrustInSoft Analyzer instance (the one given to the option -server-port).
  • with a configured reverse-proxy
Example of a basic reverse-proxy

A reverse-proxy can be installed in order to open the access of the TrustInSoft Analyzer GUI through the network.

Here is an example of a installation and configuration of a reverse-proxy:

sudo apt-get install -y nginx

{ cat <<'EOF'
server {
  listen 80;

  location ~ /(\d+)$ {
      return 301 http://$host$request_uri/;
  }

  location ~ /(?<tisport>\d+)/ {
      proxy_set_header X-Forwarded-Host $host;
      proxy_set_header X-Forwarded-Server $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_read_timeout 600s;
      rewrite /\d+/(.*) /$1 break;
      proxy_pass http://127.0.0.1:$tisport;
  }
}
EOF
} | sudo sh -c 'cat > /etc/nginx/sites-available/tis-analyzer.nginx'

Then the reverse-proxy can be enabled with:

sudo ln -s /etc/nginx/sites-available/tis-analyzer.nginx /etc/nginx/sites-enabled/
sudo nginx -s reload

After this configuration, the TrustInSoft Analyzer GUI will be available at the address http://<id_address>/<port> where <ip_address> is an IP address of the computer where TrustInSoft Analyzer is running and port the port used for the wanted TrustInSoft Analyzer instance (the one given to the option -server-port).

The reverse-proxy can be disabled with the following commands:

sudo rm -f /etc/nginx/sites-enabled/tis-analyzer.nginx
sudo nginx -s reload

Note

With an advanced reverse-proxy configuration, the HTTPS protocol and an authentication can be added to strengthen the security to access TrustInSoft Analyzer GUI.

Disabling animations

All animations of the GUI can be disabled by adding the parameter graphics=low in the URL used to access the GUI, such as:

http://localhost:12345?graphics=low

Disabling animations can lower the CPU charge with some browsers for computers with low capacity. For instance, this parameter can also be useful if the GUI is accessed through an SSH tunnel with X11 forward enabled.

Overview

After starting the server as explained before, the GUI can be opened in a browser.

GUI overview

The GUI is composed of several zones, as shown by the figure above:

  • the Menu, that gathers global features of the GUI
  • the Modes, that allows to switch between the different modes of the GUI
  • the Source Code Panel, that contains the program’s source code
  • the Interactive Code Panel, that contains a normalized version of the source code with the one the user can interact.
  • the widgets panels, that gathers the different available panels of the GUI; these panels are called Widgets
Modes

The Modes Toolbar

The GUI has a lot of widgets and information to display. Modes allow to have pre-selected sets of widgets to focus on a particular work.

The different modes are:

  • Pre-processing, focused on the parsing phase of the analysis.
  • Value Analysis, focused on the Value Analysis settings and statistics.
  • Exploration, focused on the exploration the analyzed program with the analysis results if any.

Note that any widget can be displayed in any mode with the Workspace menu.

Interactive Code Panel

The Interactive Code Panel

The central Interactive Code Panel shows the normalized code of the program. The normalization is done by the analyzer and preserves the semantic of the program as well as the undefined behaviors existing in the source code.

The normalized code is segmented in Globals. Each tab in the Interactive Code Panel is a global of the program.

The normalized code in each tab is also interactive and reacts to the user clicks:

  • the left-click selects an element in the interactive code (see the Left-clicked elements section)
  • the right-click selects also an element and opens a contextual menu (which entries are described in the Right-click menu section)

Moreover, the margin at the left of the interactive code is used to put Status bullets.

Left-clicked elements

Six kinds of elements can be clicked in the interactive code:

  • an expression (x, y << 1 + 6, counter == 0, z++, …)
  • a statement (x = y << 1 + 6;, if (counter == 0) { ... }, z++;, …)
  • a declaration (int x;, void foo(int y) { ... }, …)
  • a function call (foo in the expression foo(5), …)
  • a type with a definition (mystruct in any expression or declaration, …)
  • an ACSL annotation (assert x == 0;, ensures \initialized(\result);, …)

Left-clicking on an element in this panel will select it and will be hilited in blue.

If the selected element is an expression, then it will be referred as the current expression (also called current term in some widgets).

At the same time, the selected elements wrapping statement will also be hilited with a rectangular blue mark at the left of the interactive code, near the left-margin. This statement will be referred as the current program point.

Note that some elements have no wrapping statement (such as declaration or some types). In this case:

  • if the element is in a function, the current program point will be the Pre-state or Post-state (shorten Pre and Post) of this function:
    • Post for ACSL annotation elements that are post-conditions (ensures clause)
    • Pre for all other cases
  • if the element is not in a function, then a default program point will be chosen:
    • the program’s entry point if exists
    • a random function otherwise

The concept of current expression and current program point are often used in widgets on the left side of the GUI (such as Function Context).

Caution

The current expression and/or the current program point can be automatically changed by performing actions in widgets.

Status bullets

The margin at the left of the interactive code holds bullets that represent the status of the ACSL properties.

The semantics of these bullets is detailed in the Properties section.

Right-click menu

Any left-clickable element in this panel is also right-clickable.

Right-clicking on an element will select it and apply the rules listed in the Left-clicked elements section.

Moreover, it will also open a contextual menu which items can be extended by widgets. The menu items can differ depending of the kind of the selected element.

There are three categories of menu items.

Basic Research Items

These items allow to perform different kind of semantic researches in the program:

  • Show Defs: Show the last write to the selected left-value according to the current statement. The list of these statements are shown in the Statement Context widget. This action needs Value to be computed. Only shown on “expression” elements that are left-values.
  • Show Defs [..]: is similar to Show Defs but returns the list of statements defining the contents of the memory to which the selected left-value point to.
  • Occurrence: Show all the reachable statements where the selected left-value appears. The list of these statements are shown in the Statement Context widget. This action needs Value to be computed. Only shown on “expression” elements that are left-values.
  • Show impacting conditionals: Show all conditionals (if and case statements) that impact on the selected left-value among all possible execution paths (see Impacting conditionnals). This action needs Value to be computed. Only shown on “expression” elements that are left-values.
Widgets Items

These items will perform action related to their associated widget:

  • Track this term is an item of the Values widget. The selected term will be tracked in the widget’s grid. Only shown on “expression” elements.
  • Slice on this left-value is an item of the Slicing widget. A slice request will be done on the selected left-value. Only shown on “expression” elements that are left-values.
  • Slice on this statement is an item of the Slicing widget. A slice request will be done on the selected statement. Only shown on “statement” elements and on all elements inside a “statement” element.
  • Prove annotation with WP is an item of the WP Goals widget. Launch the WP Plugin on the selected annotation. Only shown on “ACSL annotation” elements.
  • Prove function annotations with WP is an item of the WP Goals widget. Launch the WP Plugin on all annotations of the selected function. Only shown on “function declaration” elements.
  • Prove call preconditions with WP is an item of the WP Goals widget. Launch the WP Plugin on annotations which are pre-conditions of the selected function. Only shown on “function call” elements.
Left Line

The colored vertical line on the left-hand side indicates statements where precision was lost or came close to being lost during the value analysis.

When the vertical line is yellow, no precision was lost but more than X% of the available slevel was consumed, where X is a percentage set in the Options and is set to 50% by default. In this case, the tooltip text shows Slevel counter: <used>/<slevel> where <used> is the count of separated states used and <slevel> is the slevel.

When the vertical line is red, precision was lost because some separated states were joined. In this case, the tooltip text shows <reasons>: <used>(+<widened>)/<slevel> where <reasons> are the reasons why states were joined, and <widened> is the number of widening states used.

See About Statements for more information about the reasons why some states may be joined, or about the other elements in the tooltip.

Source Code Panel

The Source Code Panel

The panel on the right side of the GUI shows the original source files.

Clicking on a line in this panel will change the current expression and current program point in the Interactive Code Panel to match with the associated line of the source code.

The associated line in the interactive code may not be shown for the following constructs:

  • statements in a inline function
  • fields in a struct, enum or union definition

When the cursor is in a source file, some shortcuts are available:

  • Ctrl+F: Search
  • Ctrl+L: Go to line

Tip

If a selected line in the source code has several different locations in the interactive code (for instance with C++ templates), a window will appear with these different locations and will jump to the location once it’s clicked.

Edition mode

The edition of files is allowed only on Edition mode which can be enabled with the button in the title of the Source Code Panel.

Once the mode is enabled, files can be edited, restored to their original version (button Restore File) or saved (button Save File). When a file is saved, the old content is overwritten and the operation cannot be undone.

We highly recommend to use an external Source Code Management framework in order to be able to revert these changes.

Caution

The Analyzer must be restarted to take into account the changes done in these files.

Two warning messages can appear above a source file:

  • This file has unsaved changes: some changes have been done in this file but these changes have not been saved yet and will be discarded if the GUI is closed or reloaded.
  • This file may not be synchronized with the current analysis: some changes have been done to this file and the current version is not the one the analyzer uses for the analysis. The Analyzer should be restarted in order to take these changes into account.
File System Panel

The File System Panel

This central panel displays the files located in the analyzer’s working directory and sub-directories and their status. It also allows to select (or deselect) files to parse with the analyzer.

Tip

  • You can only select source files recognized by the analyzer (C and C++ files).
  • Unix hidden files (i.e. filenames that begin with a dot) are not shown by default. They can be display with the Show hidden files checkbox located at the toolbar of this panel.
  • Selecting (resp. deselecting) a directory will select (resp. deselect) all source files located in this directory and its sub-directories.

Warning

Actions performed in this panel (such as removing a file) will act on the file system where the analyzer runs and cannot be cancelled.

Warning

It is not recommended to run the analyzer from /, your $HOME, or any valuable directories. Keep your files safe with a version control system.

List of Widgets

A widget is a panel which bring new features in the GUI.

By right-clicking on the widget title, each widget can be moved to the bottom or left side of the GUI and also be in a floating window, called “detach” mode.

The widget list can be found in the Workspace in the Menu and allows to show or hide widgets. Note that some widgets are displayed automatically.

Here is a list of available widgets in the GUI (experimental widgets may not appear in this list):

Analysis Advanced Settings

Analysis Advanced Settings Widget

This widget allows to change more settings than the Analysis Settings widget for the Value Analysis.

Each setting has its own sub-panel, accessible by selecting its tab on the left of the widget.

Slevel per function

This panel allows to add, change or remove a slevel limit for a given function. It has the same behavior as the -slevel-function option.

Merge after loop

This panel allows to enable or disable the default merge strategy for all functions and to enable or disable the merge strategy for a subset of functions. It has the same behavior as the -val-slevel-merge-after-loop option.

Split return strategy

This panel allows to change the global split strategy (applied by default to all functions) and to add, change or remove a split strategy for a given function. It has the same behavior as the -val-split-return and -val-split-return-function options.

Analysis Advanced Statistics

Analysis Advanced Statistics Widget

This widget gathers more statistics about the Value Analysis than the Analysis Statistics widget.

It displays:

  • the current function analyzed by the Value Analysis during its computation, with its callstack.
  • the number of alarms by category.
  • the N most analyzed functions (with their callstack). The value of N can be changed in the top-right corner of the associated panel.
  • the N most analyzed statements. The value of N can be changed in the top-right corner of the associated panel.

Note

These statistics are updated during the computation of the Value Analysis.

Analysis review

Analysis review Widget

This widget gathers results about the performed analysis, allowing the user to make a precise review of its analysis.

Most of the reported problems, warnings or information shown in the Overview can be found in this widget with more details.

Note

This widget is not shown if the Value Analysis is not computed or if there are no problems, warnings, or information to be displayed.

This widget uses a sub-panel for each category of reported problems, warnings or information.

Each sub-panel has its own help on its associated category. Some of the sub-panels also have hints and useful links to the documentation that can help the user to do the review of the associated category.

Also, the user can add a comment for each item in these sub-panels (with the icon Comment icon) and mark an item as reviewed (with the icon Mark as Reviewed icon), allowing him to track his progress in the review. Note: only warnings and information items can be marked as reviewed; problems, such as the Potential undefined behaviors, cannot be ignored and should be resolved.

All the detailed information of the review, including user comments and marks of his review progress, can be exported and downloaded with the Export Analysis Results widget.

Analysis Settings

Analysis Settings Widget

This widget allows to change the most common settings for tuning the Value Analysis. The settings are:

  • Entry point: the entry point to use for the analysis. See also the -main option.

  • Global slevel: the global value of the -slevel, used for all functions/statements if no other slevel directive is given for the function/statement.

  • Enable interpreter mode: when checked, the analyzer settings are updated to run the next Value Analysis in interpreter mode. When this option is checked, other settings may be overwritten. See also the interpreter profile of the -val-profile option.

  • Malloc plevel: the value of the -val-malloc-plevel option.

  • Stop analysis after finding X alarms: force the Value Analysis to abort after finding X alarms.

    Note:

    • The Value Analysis may be aborted after finding a number of alarms that is not exactly X due to internal implementation reasons.
    • In this option, a same alarm may be counted several times (if inside a loop or a function called several times).

    So the number given to this settings is not an exact count of alarms, but rather an heuristic to abort the analysis.

  • Stop analysis at first garbled mix of addresses: force the Value Analysis to abort after finding a garbled mix of addresses.

  • Save results by slevel: keep more precise results of the Value Analysis. It allows to have more precise evaluation of values in widgets such as the Values one. However, this setting also increases the memory usage.

Analysis Statistics

Analysis Statistics Widget

This widget gathers simple statistics with charts about the last computation of the Value Analysis, such as:

  • the statement coverage
  • the total time used by the Value Analysis
  • the alarms grouped by their category

Note

These statistics are updated during the computation of the Value Analysis.

Call tree

Warning

This widget belongs to the Experimental features. The documentation about this widget may be incomplete or out-of-date.

The described feature below may still contain bugs. We do not recommend the usage of the Experimental mode for operational usage.

This widget offers an alternative view of the callstacks of the current function as it can be found in the Callstack or Callgraph widgets.

This widget displays the callstacks of the current function as a tree. When the current function changes, the previous callstacks are still displayed among the callstacks of the new current function and the callstacks of the new current function are highlighted.

Also, all call sites of a same function are displayed in the same order as the one displayed in the Interactive Code Panel.

By default, only the callstacks of the visited function are displayed. However, if among the displayed call sites, a function has more call sites than the one displayed in the widget, then a ... placeholder is displayed, which shows the hidden call sites when clicked upon.

Callgraph

Callgraph Widget

This panel shows up when selecting the Callgraph tab.

Introduction

A main purpose of the Callgraph widget is to provide an easy way to navigate through big-sized callgraph. To achieve this goal, only a few nodes are displayed in the first place and discovery of the graph is done interactively. A contextual menu is accessible when double-clicking on a node. This menu gives access to navigation features: folding and unfolding the children/parents of the selected node. A Show source code button is also available in the menu for opening the source code of the function in the usual source code panel.

Starting to use the Callgraph widget is as simple as choosing a root (function name) and pressing the Compute & Show button. You can then start to navigate through the graph using the contextual menu. The default callgraph mode is Value Analysis. In addition, several other modes exist and are detailed below.

The right-side panel

The Properties widget gives information about the selected node, and each property can be used as a selector to color the graph. Clicking on arrow icon on the right side of a property colors currently displayed nodes, which makes it easy to find out all nodes sharing the same value for this property. For instance, in the screenshot shown above, the selection has been made on the reached by value property.

Below node properties, you will find some useful actions:

  • Find a node:

    Fill the find a node field with the function name you are looking for and press Enter. This is particularly useful when the displayed graph contains more nodes than what you can deal with manually.

  • Reset zoom:

    Clicking this button resize the graph to make it fit in the display area.

  • Freeze nodes:

    Usually the nodes are moving until the whole graph comes to a stable configuration. In some cases, it may be useful to freeze it beforehand. To do this, you can toggle the Freeze nodes button.

  • Hierarchical layout

    Toggle this button allows you to switch between the physics-law layout and the hierarchical one. If the Vert. checkbox is checked when pressing the Hierarchical layout button, the graph is displayed in a top-to-bottom direction. Otherwise, a left-to-right direction is used.

Available callgraph modes
  • Value analysis (default mode)

    Choose a root node by typing a function name in the root1, root2, ..., rootn field.

    Then press the Compute & Show button to display the root node(s) of the graph.

  • All paths from A to B

    This mode provides a view of all possible paths from a given function to another.

    Choose the start and destination nodes of the paths by filling the corresponding fields.

    The field root1, root2, ..., rootn must contain at least one ancestor of the start and destination nodes.

    Then press the Compute & Show button to produce the graph together with a list of paths.

    Select a path in the list to highlight it in the graph.

  • All paths from A to B in interactive mode

    In cases where the list of paths is huge, you can use the Interactive mode by selecting the so-named checkbox before pressing the Compute & Show button. You will end up with only one path displayed on a comb-shaped graph. Each possible branching to another path is represented by an edge to a null-outdegree node. Clicking on such edges will give you access to an easy navigation through all possible paths.

  • Source files

    In this mode, functions defined in the same file are grouped in a single node whose name is the file name.

    The checkbox Only directories allows you to group by directories instead of files.

    The checkbox Download indicates that you want to download the list of files (resp. directories) as a text file.

Callstack

Callstack Widget

This widget shows up information about the available callstacks of the current function shown in the Interactive Code Panel.

This widget also introduces the concept of focused callstack: by clicking on a callstack in the grid, the selected callstack will become the focused callstack.

All widgets that support the View per Callstack feature (included the Interactive Code Panel) are called callstack-wise widgets. These callstack-wise widgets will filter their data according to this focused callstack. See the View per Callstack section for more details.

A focused callstack can be unfocused with the button Unfocus callstack.

Moreover, since the list of callstacks changes depending on the current function, a callstack can be pinned by clicking on the grey star icon in order to keep it in the list.

The widget also shows up two more information:

  • the number of callstacks of the current function (do not include the pinned callstacks in the grid)
  • if some data of callstack-wise widgets are filtered by the focused callstack or not

Tip

If the field Data filtered is Yes, it is highly recommended to be careful about the interpretation of the results

Export Analysis Results

Export Analysis Results Widget

This widget allows to export analysis results of the Analysis review widget (including the user comments and manual review statuses) in two different ways:

  • into a text report with the button Text report
  • into a configuration file option with the button Configuration file. The output file can be given to another instance of TrustInSoft Analyzer by using the command line option -tis-review. All user comments and manual review statuses will be imported in the new analysis if their attached object (statement, function, properties, …) still exists in the new analysis. Every imported data that cannot be attached to an object in the new analysis are displayed in the Review Invalid data widget.
Files

Files Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers information about the source files of the analyzed program.

The information shown in the columns is provided by the -info-files option of the Tis-info plug-in, so you can refer to the About Files section of its manual to get more information about them.

Find Occurrences

Find Occurrences Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget is quite similar to the Occurrence action in the Right-click menu except only global variables are accepted and a program point is not required.

The grid gathers all the occurrence in reachable statement of the global variable given in the toolbar’s Find global variable field.

Function Context

Function Context Widget

This widget shows up information about the current function seen in the Interactive Code Panel.

Information above the grid are data of the Functions widget for the current function.

This widget also allows to update some of the Value Analysis options for the current selected function without impacting the other functions.

And the grid contains the list of callees and callers of the current function.

For the status column:

  • the Syntactic status is for entries found syntactically in the source code. Function pointers cannot be found syntactically.
  • the Semantic status is for entries found with the Value Analysis. Calls located in dead code are not found.
  • the Both status selects both Syntactic and Semantic status

Note

The status can change according to the focused callstack (see View per Callstack)

The results shows all the different callsites in the case of a function that calls another one several times. The different callsites can be hidden with the filter button Hide callsites and will only show one line per function.

The list of callers/callees of the function context can be pinned by clicking on Pin icon and a copy of this list will be kept even if the current function is changed.

A pinned function context can be unpinned by clicking on Unpin icon

Functions

Functions Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers informations about all functions in the analyzed program, including the standard library and the analyzer built-ins.

The information shown in the first columns is provided by the -info-functions option of the Tis-info plug-in, so you can refer to the About Functions section of its manual to get more information about them.

The last columns show statistics about the analysis of these functions.

When sorting by Self time, you can find out in which functions the analysis spent time. There can be at least two reasons:

  • either the function is called a huge amount of time and memexec does barely hit,
  • or some of its statements take a huge amount of time.

If the reason is the second one, let’s go to the Statements.

Globals

Globals Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers all program’s Globals.

A global is either a global variable, a type declaration or definition (typedef, struct, union, enum), a function declaration or definition, a global annotation or also a piece of assembly code.

Grep

Grep Widget

This widget allows to run a grep command on files of the analyzer’s File System Panel.

The widget allows the usage of 4 different grep modes: - the Basic Regexp mode - the Extended Regexp mode - the Perl Regexp mode - the Fixed strings mode

The search is by default case sensitive. The box Case insensitive can be checked to change this behavior.

The grep command will search by default in source files only. This behavior can be changed by unchecking the box Only in source files.

Import/Export Sources

Import/Export Sources

This widget allows to both upload and download source files to (resp. from) the file system displayed in the File System Panel panel.

Uploading sources

The button Upload sources asks the user to select files from his computer directly at the root of the analyzer file system.

Archives, such as .tar, .tar.gz and .zip files, will be automatically extracted at the root of the analyzer file system.

Also, all uploaded files can be read and modified in the analyzer, but they will lose their execute permission flag if it was present on the original file.

Downloading source

The button Download sources allows to download all the content of the analyzer file system, including all hidden files.

Linking Result

Linking Result Widget

This widget displays the detailed error message of the last attempt of linking all parsed files.

Note

This widget is not shown if there is no error to display.

Overview

Overview Widget

This widget gathers the current analysis state and results and allows to quickly access to the two main actions: parsing files and running the Value Analysis.

When the Parsing (resp. the Value Analysis) button is highlighted (underlined with an orange color), it means the parsing (resp. the Value Analysis) is out-of-date: a file or settings have been changed since the last run. The parsing (resp. the Value Analysis) should be performed again, otherwise data displayed in some widgets (such as the Interactive Code Panel or Values) may not be correctly synchronized with the current settings of the analyzer.

The Cog icon button allows to set optional parameters while parsing or running the Value Analysis.

This widget also allows to quickly set (or unset) a configuration file to load before running the parsing operation.

Then, the global status of the analysis and its detailed results (when available) are displayed.

Global analysis status

The possible analysis status are:

  • No files to analyze: the analyzer’s File System Panel does not contain any source file to parse. Files can be added either by modifying directly the current working directory of the analyzer if you can access to the computer on which it runs or with the Import/Export Sources widget otherwise.
  • No files parsed: the analyzer’s File System Panel contains source files to be parsed but none of them have been selected to be parsed. Files can be selected to be parsed either with check-boxes in the File System Panel or by using a configuration file that defines a files field.
  • Preparation failed: this status is used for two kinds of error:
    • the loaded configuration file contains errors and cannot be loaded.
    • all files have been parsed but cannot be linked together. In that case, the Linking Result widget will display the error message.
  • Compilation failed: a selected file cannot be parsed by the analyzer. In that case, the Parsing Result widget will display the error message.
  • Compilation succeeded: all files have been successfully parsed and linked. However, in this state, the Value Analysis has not been performed. It can be run by clicking on the Value Analysis button above the status.
  • Analysis failed: all files have been successfully parsed and linked, and the Value Analysis has been run but cannot be performed. The reason of the failure is displayed in the detailed results under this status. Note that an aborted Value Analysis is considered as (prematurely) finished and not as failed.
  • Analysis finished: all files have been successfully parsed and linked, and the Value Analysis have been successfully performed. The analysis results are detailed under this status.
Detailed results

The detailed results can contain the following items:

  • the warnings in the configuration file: this item notifies about warnings encountered when the current configuration file has been loaded. By clicking on this item, the list of warnings will be displayed in a new window. It is recommended to take care about these warnings first because they can lead to a unexpected analysis setup.

    This item is only shown when the loaded configuration file has warnings.

  • the errors in the configuration file: this item notifies about errors encountered when the current configuration file has been loaded. By clicking on this item, the list of errors will be displayed in a new window. Files cannot be parsed until all errors of the configuration to load are resolved.

    This item is only shown when the selected configuration file cannot be loaded.

  • the parsing results:

    • when the global status is Compilation failed, this item displays the reason of the error and, when clicked, the Parsing Result widget will be shown to display the detailed error message.
    • otherwise, this item displays the number of successfully parsed files.
  • the link results: this item displays whether the files have been successfully linked. In the failure case, the item can be clicked to show the Linking Result widget to display the detailed error message.

    This item is only shown when the link fails or when the global status is Compilation succeeded: if the Value Analysis has been run, it implies the link has been successful.

  • the entry point information: this item is displayed if:

    • the Value Analysis cannot be run if the given entry point does not exist
    • the Value Analysis has been successfully computed. In that case, it also displays the exit code of the application or the returned value of the entry point if any.
  • the potential undefined behaviors found: this item displays the number of potential undefined behaviors found during the Value Analysis. If this number is not zero, clicking on this item allows to jump in the Interactive Code Panel to the first potential undefined behavior found.

    This item is shown only if the Value Analysis is successfully computed.

  • the reason of the prematurely stop of the analysis: this item displays why the Value Analysis has been stopped prematurely. For instance, the Value Analysis can be stopped prematurely:

    • by aborting manually the analysis from the Abort button in the overview (shown only when the Value Analysis is running).
    • by aborting manually the analysis from the command line (with the USR1 signal).
    • by the Value Analysis itself on encountering unsupported behavior, or because of options such as -val-stop-at-nth-alarm, …
    • by a lack of memory on the analyzer’s host computer.

    This item is shown only if the Value Analysis is successfully computed.

  • the list of unknown called functions: this item displays the number of unknown called functions encountered during the Value Analysis. Clicking on this item allows to jump in the Interactive Code Panel to one of these unknown called functions. Calling an unknown called function can lead to severe imprecision in the analysis.

    This item is shown only if the Value Analysis is successfully computed and if the number is not zero.

  • the list of non-terminating statements: this item displays the list of non-terminating statements encountered during the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    This item is shown only if the Value Analysis is successfully computed, without any potential behavior found and if the number is not zero.

  • the list of unproven properties: this item displays the list of unproven properties after computing the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    This item is shown only if the Value Analysis is successfully computed, without any potential behavior found.

  • the list of undefined functions with a specification: this item displays the list of undefined function with a specification used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    The specification of such functions should be carefully written.

    This item is shown only if the Value Analysis is successfully computed and if the number is not zero.

  • the list of defined functions where specification are used: this item displays the list of defined function with a specification used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    The body of the function should be used instead of the specification unless the specification is used for time performance issues. In that case, the specification of such functions should be carefully written.

    This item is shown only if the Value Analysis is successfully computed and if the number is not zero.

  • the list of used stdlib functions: this item displays the list of used stdlib functions encountered during the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed and if the number is not zero.

  • the list of unused parsed files: this item displays the list of parsed files that have not been used by the Value Analysis. Clicking on this item allows to display the associated panel in the Analysis review widget with more detailed results.

    Most of the time, these files can be removed from the list of files to parse without modifying the behavior or results of the Value Analysis.

    This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed and if the number is not zero.

  • the coverage: this item displays the function and statement coverage of the Value Analysis.

    This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed.

  • the target architecture used: this item displays the Target Architecture used for this analysis.

    This item’s purpose is only informative. This item is shown only if the Value Analysis is successfully computed.

Parsing Result

Parsing Result Widget

This widget displays the detailed parsing error message of the last attempt of the parsing operation.

It also displays the command line used to parse the file.

Note

This widget is not shown if there is no error to display.

Parsing Settings

Parsing Settings Widget

This widget allows to add, change or remove compilation options for parsing every selected file.

Only the following relevant compilation options are kept: -I, -D and -U. All other options will be discarded.

This widget is similar to the compilation_cmd field of the Configuration files.

Note

Currently, it is not possible to give a set of compilation options for a single file.

Paths inspector

Warning

This widget belongs to the Experimental features. The documentation about this widget may be incomplete or out-of-date.

The described feature below may still contain bugs. We do not recommend the usage of the Experimental mode for operational usage.

This widget is related to Whole Program Graph computed during the Value Analysis to keep the finest possible information about all possible execution paths in the program.

This feature can be activated only in Experimental mode, with the options -whole-program-graph and -graphdeps.

The Show Defs action in the Interactive code panel’s Right-click menu will use this Whole Program Graph feature when possible (only in Experimental mode), which is more precise than the old computation way.

Also, some button actions under alarms (such as the Uninitialized alarms) in the Interactive Code Panel can also use the Whole Program Graph when possible.

Projects

Projects Widget

This widget shows up the current TrustInSoft Analyzer internal project and also allows to change the current project to another existing project.

Some TrustInSoft Analyzer features can create project, such as exporting a slicing request in Slicing or using the Mthread plug-in.

Properties

Properties Widget

This widget is a grid: see Grids and filters for general information about how to use it.

The information shown in the columns is provided by the -info-properties option of the Tis-info plug-in, so you can refer to the About Properties section of its manual to get more information about them. The status column rather shows a colored bullet instead of the status name. The correspondence is:

  • never_tried status bullet: never_tried: no status is available for the property
  • considered_valid status bullet: considered_valid: there is no possible way to prove the property (e.g., the post-condition of an external function). We assume this property will be validated by external means.
  • surely_valid status bullet: surely_valid: the status is True and the dependencies have the consolidated status surely_valid or considered_valid.
  • unknown status bullet: unknown: the status is Maybe.
  • surely_invalid status bullet: surely_invalid: the status is False, and the dependencies the consolidated have status surely_valid.
  • valid_under_hyp status bullet: valid_under_hyp: the local status is True but at least one of the dependencies has a consolidated status unknown. This is typical of proofs in progress.
  • invalid_under_hyp status bullet: invalid_under_hyp: when the local status is False, but at least one of the dependencies has status unknown. This is a sign of a dead code property, or of an erroneous annotation.
  • valid_but_dead status bullet: valid_but_dead: the status is True, but the property is in a dead or incoherent branch.
  • invalid_but_dead status bullet: invalid_but_dead: the local status is False, but the property is in a dead or incoherent branch.
  • unknown_but_dead status bullet: unknown_but_dead: the local status is Maybe, but the property is in a dead or incoherent branch.
  • inconsistent status bullet: inconsistent: there exists two conflicting consolidated statuses for the same property, for instance with values surely_valid and surely_invalid. This case may also arise when an invalid cyclic proof is detected. This is symptomatic of an incoherent axiomatization.

Notice the Emission rank columns that is very useful when studying the alarm. First of all, its filter button enables to show Only alarms. Moreover, one can sort the alarms using the order of detection during the analysis, and as mentioned in the Study the Alarms section of the user guide, it is a good idea to explore them in this order. The meaning of the generated alarms are detailed in Value Analysis Alarms.

Review Invalid data

Review Invalid data Widget

This widget gathers all user comments and manual review statuses set in the Analysis review widget that can no longer be attached to an object (function, statement, properties, …) in the current analysis.

Most of the time, the source code of the program may have been slightly changed and the analyzer did not successfully find the new associated object for a user comment for example. In that case, the user comment will be kept in this widget to allow the user to copy/paste the user comment in the new associated line in the Analysis review widget.

Shared Variables

Shared Variables Widget

This feature is available only when the Mthread Plug-in is active with the option -mthread.

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers all read and write accesses to shared variables in the program.

An access is said to be safe if at least one mutex is protecting the access for sure. A potential unsafe access is notified with the Warning icon icon in the Is Safe column.

In the Protection column, the list of mutexes that protect the access is given. A mutex name is prefixed by (?) if the mutex may be taken or not.

In the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <pthread.h>

/*@ assigns \result \from \nothing; */
extern int flip(void);

static pthread_mutex_t m1, m2;
static int x;

static void *slave(void *arg)
{
    x++;
    return arg;
}

int main(void)
{
    pthread_t th;

    pthread_mutex_init(&m1, NULL);
    pthread_mutex_init(&m2, NULL);

    pthread_create(&th, NULL, slave, NULL);

    if (flip())
        pthread_mutex_lock(&m1);

    pthread_mutex_lock(&m2);
    x++;

    return x;
}

The mutex m1 can be locked or not depending on a unknown function. On the other hand, the mutex m2 is always locked. So the accesses to x in the main function by the _main_ thread are safe.

In the slave function, no mutex are taken. Then the accesses to x by the thread th are not safe.

Slicing

Slicing Widget

This widget is shown automatically when a slicing action has been done from the Right-click menu in the Interactive Code Panel

This widget allows to switch between different slicing request, to export them into a project (available from the Menu) and to download the sliced program.

Statement Context

Statement Context Widget

This widget gathers information about the statement of the current program point.

All data above the grid are information extracted from the Statements widget for the current statement.

The grid shows up previous and next statements of the current statement selected in the Interactive Code Panel.

A statement can be added to the tracked statements with the green plus button button. A tracked statement will be kept in another grid and will not be removed when another statement is selected. A tracked statement can be removed from the tracked statement grid with the red minus button button.

The Kind column can be:

  • Prev for the previous statement of the current selected statement in the Interactive Code Panel
  • Next for the next statement (like Prev)
  • Occurrence for the result of the Occurrence action in the Right-click menu
  • Defs for the result of the Show Defs action in the Right-click menu
  • Impactif for the result of the Show impacting conditionnals action in the Right-click menu

Note

Each list of statements resulting of actions like Occurrence are kept in another grid but the tracking feature has the same behavior

The Info column can take several values depending of the Kind column:

  • For the Occurrence kind:
    • Read when the targetted left-value is read in this statement
    • Written when the targetted left-value is written in this statement
  • For the Defs kind:
    • Direct when the statement modify the targetted left-value
    • Indirect when the statement is a call of a function where the targetted left-value is modified

Clicking on a statement in this grid will send you to the selected statement in the Interactive Code Panel.

Statements

Statements Widget

This widget is a grid: see Grids and filters for general information about how to use it.

The information shown in the first columns is provided by the -info-statements option of the Tis-info plug-in, so you can refer to the About Statements section of its manual to get more information about them.

To investigate when the analysis spent time, begin with the Functions. If more information is needed, filter the problematic function using the Function column and sort by Self time (this column may be hidden by default). Again there is at least two reasons for a statement to take time:

  • either the Counter is high,
  • or something else in the statement is high (like the number of possible values).
Stdout

Stdout Widget

This widget displays all messages that should have been printed in stdout (with functions such as printf or fprintf) if the analyzed program was executed.

Note

This widget is not shown if there is no messages to display.

Stderr

Stderr Widget

This widget displays all messages that should have been printed in stderr (with functions such as fprintf) if the analyzed program was executed.

Note

This widget is not shown if there is no messages to display.

Target Architecture

Target Architecture Widget

This widget allows to change the value of the Target Architecture, also known as the -machdep option. Once changed, files should be parsed again to take this change into account.

Types

Functions Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers informations about all types in the analyzed program.

The information shown in the first columns is provided by the -info-types option of the Tis-info plug-in,

Unresolved array length

This widget gathers the list of unresolved array lengths detected in the analyzed program.

Using such patterns is not recommended since related undefined behaviors are not detected by TrustInSoft Analyzer.

Value Messages

Value Messages Widget

This panel’s feature is only available if the -value-messages option is given to the analyzer.

This widget gathers all messages emitted by the Value Analysis. Some messages are emitted several times (for example: an alarm in a loop). Duplicated messages can be shown by clicking on the filter button of the column Emission Rank.

The Severity gives an idea of the message’s importance:

  • 0 for an alarm with a status surely_valid status bullet Valid in this state
  • 2 for an alarm with a status surely_invalid status bullet Invalid in this state
  • 1 for an alarm with a status unknown status bullet Unknown and for any other messages

Note

A same alarm can have several status in different states. The status seen in the Interactive Code Panel and Properties is the consolidate status of all the states where the alarm is raised.

For example, in a loop, an alarm can be Unknown in the first step and then always Valid. Its consolidate status is Unknown.

The Detailed value message panel

The Detailed value message panel can be shown by clicking on a button More actions in the last unnamed column.

Detailed value message Panel

This panel is quite similar to the Values widget and allows to evaluate several terms in the associated state of the selected message.

Clicking on another More actions button will change the targetted message of the Detailed value message panel. So the associated state where terms are evaluated is also changed while the list of terms remain intact.

Values

Values Widget

When a value analysis has been done, this panel is very useful to explore the values that have been computed.

By default, it displays the current selected expression in the Interactive Code Panel panel with its computed values before and after the selected statement. Moreover, more expressions can be added in this grid to help comparisons of values of several expressions.

Expressions can be tracked by:

  • Entering a new expression in the text field at the top of the widget.
  • Clicking on the green plus button icon next to the current selected expression.
  • Right-clicking on an expression in the Interactive Code Panel and then by selecting the Track this Term menu.
Columns

Some columns of this grid are automatically hidden/shown according to the selected Precision in the toolbar.

Tip

If a grid cell is too small to fully display its text, you can click on that cell (anywhere but on a link) to expand/collapse the full text.

The list of columns are:

  • The green plus button (resp. red minus button) icon: when clicked, the associated expression is tracked (resp. untracked).

  • The magnify button icon: when clicked, changes the focused callstack to the one of the associated line (see also View per Callstack).

  • Flags to display icons to highlight distinctive characteristics of the expression, such as the volatile characteristic for a variable.

  • Expression.

  • Emission rank: the emission rank for a same expression is the order of evaluation of their values during the Value analysis, regardless of the callstack.

  • Callstack to display the callstack of the associated expression.

    Note the gray color for a callstack indicates it is the same one as the callstack of the line above.

  • Call rank: a call rank of N indicates that this evaluation is the Nth call of the same callstack. This can happen when a function call is inside a loop.

  • Slevel rank: a slevel rank of N indicates that the Value analysis has visited the statement of the selected expression for the Nth time for the same call rank and the same callstack.

  • Is widening indicator to know if the associated evaluated values may have been over-approximated due to a lack of slevel for the selected statement or function.

  • Before value: the evaluation of the expression before the selected statement.

    Note: The red background indicates the evaluation may have failed.

  • Is equal indicator to easily check if the before and after values are equal

  • After value: the evaluation of the expression after the selected statement, if available.

    Note: The red background indicates the evaluation may have failed.

Precision

The precision allows to display the evaluated values of expression in different ways. The precision can be changed in the widget toolbar with buttons next to the Values label:

  • All displays the consolidated value of expressions (i.e. the grid contains one line per expression). This precision can also introduce over-approximations since all states of the Value analysis are joined.

  • Per Callstacks displays values, one line per available callstacks for each expression. It is more precise than the All precision.

  • Per Call displays values, one line per function call of the current selected statement, for each expression. It is more precise than the Per Callstacks precision if functions of the callstack are called inside a loop.

    This precision is available only if results by slevel are saved during the Value analysis with the option -val-slevel-results.

  • Per Path displays values, one line per path (i.e. per available state in the Value analysis) for each expression. It is the most precise way to display values.

    This precision is available only if results by slevel are saved during the Value analysis with the option -val-slevel-results.

Grouping grid lines

For precision other than the All one, when several expressions are displayed, comparing values on different lines can be tiresome.

There are two different ways to group lines of the grid, available next to the Group by label in the toolbar:

  • by Term: every line of the same expression are grouped together, useful to compare differences of values of the same expression between different callstacks or paths.
  • by Emission rank: every line of the same emission rank (for every available expression) are grouped together, useful to compare the values of different expressions for the same callstack or path.

Then, lines can be sorted by clicking on the Emission rank column. Note that the current expression is always displayed at first, only tracked expressions are sorted.

Variables

Functions Widget

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers informations about all variables in the analyzed program.

The information shown in the first columns is provided by the -info-variables option of the Tis-info plug-in, so you can refer to the About Variables section of its manual to get more information about them.

Virtual File System

Virtual File System Widget

This widget allows the usage of the tis-mkfs tool from the Graphical User Interface.

The Virtual File System allows to simulate a File System for the analyzed program during the Value Analysis. The architecture of this virtual File System is displayed in the right panel of this widget.

However, the Virtual File System needs to be generated into a C file in order to be used by the Value Analysis. Every time the Virtual File System is changed, the Generate file system button should be used to generate again the associated C file.

Note

Once the C file for the Virtual File System is generated, the parsing operation should be run again

Enabling the Virtual File System options in the analyzer

In order to use the Virtual File System, the generation of the associated C file is not the only requirement: the File system: Enabled button, located at the top-right corner of the widget, should be toggled to allow the analyzer to correctly use the generated C file. This is the same behavior to give the option -fs on the command line.

Also, by default, the No Error Mode is disabled. When this mode is changed, files should be reparsed to correctly use this mode.

Modifying the Virtual File System

All quoted options in this section refers to the tis-mkfs ones explained in this section.

Files or directories can be added to the file system by:

  • right-clicking on a file or directory from the File System Panel and selecting the menu Add this file to the virtual file system. It is the same behavior as the option -local.
  • using the button New directory. It is the same behavior as the option -add-dir.
  • using the button New abstract file. It is the same behavior as the option -add-file.

Tip

The option -generic is not yet supported by this widget.

Note

Adding a directory from the File System Panel will prevent to modify (move/rename/delete) its content.

Then, every added file or directory can be either renamed or deleted by right-clicking on it in the Virtual File System.

File and directories can also be moved to another directory by drag’n’drop.

Exporting the current Virtual File System

The Save configuration button allows to export the current built Virtual File System to be able to build it again automatically.

The two suggested solutions are:

  • to use the filesystem field of Configuration files with the generated value.
  • to use directly the tis-mkfs tool with the generated command line.
WP Goals

WP Goals

This widget is a grid: see Grids and filters for general information about how to use it.

This widget gathers all goals computed by the WP Plugin with their status.

For more information about the WP Plugin, see the Prove Annotations with WP section.

Qed and Alt-Ergo are two provers. The WP Plugin tries first with Qed and if it is not a success, it will try with Alt-Ergo.

The WP Plugin can be configured with options on the command line (see tis-analyzer -wp-help). Three of these options can be modified directly in the GUI in the widget’s toolbar:

  • the timeout limit (-wp-timeout)
  • the split of conjunctions (-wp-split)
  • the model (-wp-model)

The Go to icon icon in the last column allows to display the whole goal and proof obligation of the selected goal.

Note

All goals persist in the grid even after computing another goal. Click on the Clear all goals button in the toolbar in order to clear the grid.

Grids and filters

Many widgets of the GUI are composed of configurable columns. This sections gives some help about how to use them.

Hide/Show Columns

First of all, when many columns are available, one can choose which one is shown or not. The menu to select them is visible at the right of each column name.

Sorting

In the same menu than the columns selection, one can choose to sort the list according to the content of the column. When the list is sorted, a small arrow appear near the column title.

Filters

Filters can be used to select a subset of the available lines. An active filter is highlighted with a red border as a reminder that some lines are hidden according to that criteria. Moreover, the number of shown lines over the total number is also printed.

Filters in several columns can be used at the same time. All the filters can be reset with the Reset filters button. A combination of filters can be saved with the Save this filter button and recalled with the Load a filter button.

Filters with list

Some of the columns have predefined possible values. It is the case for the Kind columns for instance. The filter then propose the list of the possible values that can be selected or not.

Filters with a regular expression

For the columns showing names (file, functions, etc), the filter is a Search field. The input string is interpreted as a regular expression.

The following constructs are recognized:

  • . Matches any character except newline
  • * (postfix) Matches the preceding expression zero, one or several times
  • + (postfix) Matches the preceding expression one or several times
  • ? (postfix) Matches the preceding expression once or not at all
  • [..] Character set. Ranges are denoted with -, as in [a-z]. An initial ^, as in [^0-9], complements the set. To include a ] character in a set, make it the first character of the set. To include a - character in a set, make it the first or the last character of the set.
  • ^ Matches at beginning of line (either at the beginning of the matched string, or just after a newline character).
  • $ Matches at end of line (either at the end of the matched string, or just before a newline character).
  • | (infix) Alternative between two expressions.
  • \b Matches word boundaries.
  • \ Quotes special characters. The special characters are $^\.*+?[].

The button at the left side of the input string allows to choose if the filter have to match or mismatch the given regular expression.

Downloading

The information showed in the grid can be saved in a CSV file thanks to the Download button. Only the filtered data are downloaded, then in order to download all data in the grid, the Reset filters button should be used before to press the Download button.

Note

CSV separators can be customized with the Options panel in the Menu

View per Callstack

By default, once the Value Analysis is computed, the GUI offers a view on the consolidate state of all execution paths in the program. Some widgets, such as Values, can show more precise data by using the per callstack states of the Value Analysis.

The list of available callstacks of the current function is shown with the Callstack widget.

It is also possible to focus on one of these callstacks in order to have a view only on data relative to this callstack. Focusing on a callstack can be convenient for an investigation when an alarm occurs only in one callstack in a function called several times in different execution paths.

Note

Some widgets, such as the Functions widget, are not impacted by the focused callstack. The list below indicates which widget support the view per callstack feature.

Most of the time, the widget’s documentation notifies when the widget support the feature and/or a graphical notification (icon, text, …) indicates if the view per callstack filters data in the widget.

You can focus on a callstack:

  • by clicking on one of them in the Callstack widget
  • with the proper button in widgets which support the view per callstack

Caution

Once a callstack is focused, some data in the current function of the Interactive Code Panel can be hidden by this focused callstack (as well as for all widgets supporting the view per callstack feature).

For instance, going to a function that is not related at all to the focused callstack will display the function body as a dead code.

Once this feature is used, it’s highly recommended to have the Callstack widget in the range of sight: it shows up if some data are filtered or not by the focused callstack.

A callstack can be unfocused in the Callstack widget.

In order to navigate easier in the code with a focused callstack, some shortcuts have been added:

  • Clicking on a callsite in the Focused callstack (in the Callstack widget) will change the current program point to the selected callsite and also reduce the focused callstack.

    For example, with g@24 <- f@12 <- main@5 as the focused callstack, clicking on f@12 will change the program point to the statement of the function f where g is called for this callstack. And this will also reduce the callstack to main@5.

  • In the Interactive Code Panel, when the current function is the function called by the left-most callsite of the focused callstack two new items appears in the Right-click menu:

    • Go to definition of <function> & expand focused callstack: add the selected callsite in the focused callstack and jump to the definition of the selected function
    • Go to caller <function> & reduce focused callstack: pop out the left-most callsite from the focused callstack and jump to the selected caller

    For example, let’s have f@12 <- main@5 as the focused callstack, g the function called at f@12, and h a function called by g.

    These two menu items appears only in the g function.

    • Clicking on Go to caller f & reduce focused callstack will reduce the focused callstack to main@5 and jump to the function f where g is called.
    • Clicking on Go to definition of h & expand focused callstack will expand the callstack to g@24 <- f@12 <- main@5 and jump to the definition of h

Any action that focus or change the focused callstack are registered in the history (see Backward and Forward buttons) and can be undo or redo easily.

Callstack-wise Widgets

Here is the list of widgets that support the view per callstack feature:

Impacting conditionnals

The impacting conditionnals feature syntactically determine an over-approximation on conditional guards that impact on a left-value among all possible paths at a given program point.

This feature helps the user find all conditionnals that can bring additional information about the selected left-value in order to determine if it has been over-approximated by the analyzer.

In the following example, an alarm is raised on the access of msg[s1 + s2] in spite of the conditional if (s1 + s2 >= LEN) above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#define LEN 42

unsigned int total_len = LEN;
unsigned char msg[LEN];

int main(unsigned int s1, unsigned int s2)
{
    if (s1 + s2 >= LEN)
        return -1;

    return msg[s1 + s2];
}

By selecting the left-value s1 where the problem occurs and by computing the impacting conditionnals, the conditional if (s1 + s2 >= LEN) is highlighted.

Impacting conditional in GUI

GUI Glossary

Callstack-wise widget
A callstack-wise widget is a widget that support the View per Callstack feature.
Current expression

The current expression is the element currently selected in the Interactive Code Panel.

There is 6 kinds of clickable elements in the interactive code:

  • an expression (x, y << 1 + 6, counter == 0, z++, …)
  • a statement (x = y << 1 + 6;, if (counter == 0) { ... }, z++;, …)
  • a declaration (int x;, void foo(int y) { ... }, …)
  • a function call (foo in the expression foo(5), …)
  • a type with a definition (mystruct in any expression or declaration, …)
  • an ACSL annotation (assert x == 0;, ensures \initialized(\result);, …)
Current program point
The current program point is the wrapping statement of the current expression. If the current expression has no wrapping statement, the program point will be chosen according to these rules.
Current term
Current term is an alias for Current expression in some widgets.
Focused callstack
The focused callstack is the reference callstack used by callstack-wise widgets to filter data related to callstacks: data from all other callstacks will be filtered.
Global

A global is either a global variable, a type declaration or definition (typedef, struct, union, enum), a function declaration or definition, a global annotation or also a piece of assembly code in the program.

The list of globals is available in the Globals widget.

Note that each tab in the Interactive Code Panel is a global.

Mode
The GUI separates the analysis workflow in several modes. The available widgets in the GUI depend on the current mode.
Widget
A widget is a panel that provides a feature in the GUI for the analysis.

tis-aggregate Manual

Introduction

The purpose of the tis-aggregate tool is to collect and aggregate several results generated by TrustInSoft Analyzer.

To know how to use TrustInSoft Analyzer, refer to TrustInSoft Analyzer Manual. The Basic usage section below assumes that you already know how to build tis-analyzer command lines to run analyses and get Value analysis results as explained in Prepare the Analysis.

The next Advanced usage section presents more tis-aggregate commands and options.

Basic usage

Let’s first give an example of using tis-aggregate to get the coverage from several value analysis results. As explained in the Check the Coverage section of the TrustInSoft Analyzer Manual, the coverage measures the percentage of the code that has been reached by the value analyses, and it helps to detect dead code.

Let’s consider a small application composed of two source files and a header file:

File convert.c holding some library functions:
#include "convert.h"
int convert_atoi (char * arg, uint * p) {
  unsigned int x = 0;
  for (char* p = arg; *p != '\0'; p++) {
    char c = *p;
    if ('0' <= c && c <= '9')
      x = 10 * x + c - '0';
    else
      return 1;
  }
  *p = x;
  return 0;
}
static int nb_digit (uint x, uint base) {
  if (x == 0)
    return 1;
  else {
    int nb;
    for (nb = 0; x; x /= base) nb ++;
    return nb;
  }
}
char digit_name (int x) {
  return x < 10 ? '0' + x : 'a' + x - 10;
}
char * cv_to_base (uint x, uint base, char * buf) {
  if (2 <= base && base <= 16) {
    int nb = nb_digit (x, base);
    buf[nb--] = 0;
    while (nb >= 0) {
      int d = x%base;
      char c = d <= 9 ? '0' + d : digit_name (d);
      buf[nb--] = c;
      x /= base;
    }
  }
  else
    return NULL;
  return buf;
}
char * convert (char * str_x, char * str_base, char * buf) {
  unsigned int x, base;
  int err;
  err = convert_atoi (str_x, &x);
  if (err) return NULL;
  err = convert_atoi (str_base, &base);
  if (err) return NULL;
  return cv_to_base (x, base, buf);
}
File convert.h that declares the library functions:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

int convert_atoi (char * arg, uint * p);
char * cv_to_base (uint x, uint base, char * buf);
char * convert (char * str_x, char * str_base, char * buf);
File main.c holding the test entry point:

#include "convert.h"

int main (int argc, char * argv[]) {
  if (argc != 3) {
    printf ("Error: 2 arguments expects, got %d\n", argc);
    exit (EXIT_FAILURE);
  }

  char buf[100];
  char * x = argv[1];
    char * base = argv[2];
  char * p = convert (x, base, buf);

  if (p) {
    printf ("Result: convert %s in base %s = %s\n", x, base, p);
    return EXIT_SUCCESS;
  }
  else {
    printf ("Error: unable to convert %s in base %s\n", x, base);
    return EXIT_FAILURE;
  }
}

This application can for instance be analyzed with the following command line:

$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10'

It gives results for the analysis of the application as if it were run with the two arguments 1234 and 10. Of course, it has to be run for other values to get more analysis results.

The Information about the Coverage section explains that the coverage is computed from csv files that are generated by tis-analyzer when using -info-csv-xxx options. The simplest way to generate those files is to just add the -info-csv-all option, and a name for the results, to the command line of each analysis.

For instance, to generate all the csv files in an existing results/ directory, the command line above becomes:

$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10' \
  -info-csv-all results/test-1234-10

The argument of the option must be unique for each analysis since it is used as a base name to store the analysis result files. Those names are provided to tis-aggregate through what is called the INDEX file. Each line of this file holds the base name of one analysis results. The goal of the tis-aggregate tool is to collect the information from the analyses listed in this index file.

Before creating the index, let’s create another analysis with a different input:

$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 2' \
   -info-csv-all results/test-1234-2

So, the index file to create for these two analyses above would looks like:

Index file convert.aggreg.index
results/test-1234-10
results/test-1234-2

Now the coverage can be computed with:

$ tis-aggregate coverage convert.aggreg.index

It gives the following result:

Result of: tis-aggregate coverage convert.aggreg.index
File                                  Function        #reached  #statements  Ratio  Comment      
main.c            __tis_globfini  2         2            100%   total        
convert.c         convert         7         11           64%    partial      
convert.c         convert_atoi    15        19           79%    partial      
convert.c         cv_to_base      22        27           81%    partial      
convert.c         digit_name      0         5            0%     not reached  
main.c            main            10        15           67%    partial      
convert.c         nb_digit        12        14           86%    partial      
TOTAL (functions)                                     6         7            86%                 
TOTAL (all statements)                                68        93           73%                 
TOTAL (stmts in reachable functions)                  68        88           77%                 

It shows the list of the functions with the number of reachable statements and the total number of statements.

The three last lines labelled TOTAL summarize the coverage:

  • functions: number of analyzed functions over the number of functions.
  • statements: number of reachable statements over the number of statements.
  • stmts in reachable functions: number of reachable statements over the number of statements in the reachable functions.

We can for instance see that the digit_name function is not reached. This is because it is only used for a base greater than 10. So let’s run other analyses (with other inputs) and add them into the index file to enhance the coverage:

Index file convert2.aggreg.index
results/test-1234-10
results/test-1234-2
results/test-1234-16
results/test-0-10
results/test-a2b-10

The coverage becomes:

Result of: tis-aggregate coverage convert2.aggreg.index
File                                  Function        #reached  #statements  Ratio  Comment  
main.c            __tis_globfini  2         2            100%   total    
convert.c         convert         9         11           82%    partial  
convert.c         convert_atoi    19        19           100%   total    
convert.c         cv_to_base      23        27           85%    partial  
convert.c         digit_name      4         5            80%    partial  
main.c            main            13        15           87%    partial  
convert.c         nb_digit        14        14           100%   total    
TOTAL (functions)                                     7         7            100%            
TOTAL (all statements)                                84        93           90%             
TOTAL (stmts in reachable functions)                  84        93           90%             

Notice that since the function coverage is now 100%, the statements coverage and the in perimeter one are the same.

Tis-aggregate coverage formats

To change the output format to csv (below) or html, the --format option can be used:

Result of: tis-aggregate coverage convert2.aggreg.index --format=csv
File,Function,#reached,#statements,Ratio,Comment
main.c,__tis_globfini,2,2,100%,total
convert.c,convert,9,11,82%,partial
convert.c,convert_atoi,19,19,100%,total
convert.c,cv_to_base,23,27,85%,partial
convert.c,digit_name,4,5,80%,partial
main.c,main,13,15,87%,partial
convert.c,nb_digit,14,14,100%,total
TOTAL (functions),,7,7,100%,
TOTAL (all statements),,84,93,90%,
TOTAL (stmts in reachable functions),,84,93,90%,

The csv format may be useful to be opened in a spreadsheet tool for easy selection of elements.

When using --format=html, the coverage is represented with a decomposition based on the file paths. It makes it easier to browse the results. The presentation is interactive making it possible to fold/unfold the information. The default is to show only the summary of the coverage on the files and functions. It is also possible to see the detailed coverage of each statement by using the --html-stmts dir option provided the analysis was run before with:

$ tis-analyzer -aggreg-html-stmts dir [options]

It generates an HTML file for each function in the specified dir directory. So for instance, with these commands, the HTML detailed coverage is available from the results/html-cov/index.html file:

$ tis-analyzer -val -slevel 100 convert.c main.c -val-args ' 1234 10' \
   -info-csv-all results/test-1234-10 -aggreg-html-stmts results/html-cov
$ tis-aggregate coverage convert.aggreg.index \
                --format=html --html-stmt results/html-cov

The -aggreg-html-stmts option may be used several times: previously existing files are overwritten to make sure the more recent version is available. Notice that it is not mandatory to use -aggreg-html-stmts on each analysis, but the detailed coverage is then available only on the functions that have been translated, i.e. on those which have been seen in the analyses where -aggreg-html-stmts have been used.

Moreover, whatever the format is, the output file may be specified with the --output-file option (except when using --html-stmts dir where the output file is automatically dir/index.html).

To know more about the other features of the coverage command, read the Tis-aggregate coverage section below, or use:

$ tis-aggregate coverage --help

The tis-aggregate tool also provides other commands. The help command shows the available commands and the list of the options that are common to all of them:

$ tis-aggregate help

The following Advanced usage section presents some of them.

Advanced usage

Tis-aggregate coverage

The Basic usage section above already explained how to use the coverage command, but some more details are given below.

Dead statements

To better understand the coverage, the dead code (i.e. the unreachable statements) needs to be analyzed. The easiest way to do that is to use the GUI, where the dead statements are highlighted in red, but it can be difficult to make a synthesis of the dead code over several analyses. So tis-aggregate can be used to generate the list of the statements that are not reached by any analysis.

With the --statements option, the list of the statements is filtered to keep only the unreachable statements. A path/base_statements.dead.xxxxxx.csv file (where xxxxxx is a random string) is generated. It holds information about these statements that are never reached by any analysis.

On the example below, it gives for instance:

Result of: tis-aggregate coverage convert2.aggreg.index --statements
File                                  Function        #reached  #statements  Ratio  Comment  
main.c            __tis_globfini  2         2            100%   total    
convert.c         convert         9         11           82%    partial  
convert.c         convert_atoi    19        19           100%   total    
convert.c         cv_to_base      23        27           85%    partial  
convert.c         digit_name      4         5            80%    partial  
main.c            main            13        15           87%    partial  
convert.c         nb_digit        14        14           100%   total    
TOTAL (functions)                                     7         7            100%            
TOTAL (all statements)                                84        93           90%             
TOTAL (stmts in reachable functions)                  84        93           90%             
File, Line, Function, Number
convert.c, 28, digit_name, 2
convert.c, 31, cv_to_base, 22
convert.c, 42, cv_to_base, 23
convert.c, 42, cv_to_base, 24
convert.c, 42, cv_to_base, 25
convert.c, 51, convert, 7
convert.c, 51, convert, 8
main.c, 27, main, 2
main.c, 28, main, 3
Filters

Some options are provided to decide which functions should appear in the coverage results.

For instance, the coverage of the libc functions is usually not interesting to have, so the default is to ignore them, but they can be included when the --libc option is set.

The default behavior is to show the unreachable functions, but they can be skipped by setting the --skip-unreachable-fun option. It is especially useful when using the --libc option.

This is for instance the coverage computed on the first convert.aggreg.index index file shown above, without the unreachable functions:

Result of: tis-aggregate coverage convert.aggreg.index --skip-unreachable-fun
File                                  Function        #reached  #statements  Ratio  Comment  
main.c            __tis_globfini  2         2            100%   total    
convert.c         convert         7         11           64%    partial  
convert.c         convert_atoi    15        19           79%    partial  
convert.c         cv_to_base      22        27           81%    partial  
main.c            main            10        15           67%    partial  
convert.c         nb_digit        12        14           86%    partial  
TOTAL (functions)                                     6         6            100%            
TOTAL (all statements)                                68        88           77%             
TOTAL (stmts in reachable functions)                  68        88           77%             

Notice that the function coverage is now 100% since the unreachable functions are filtered out, but it might not always been the case when using the -val-use-spec option for instance.

The --skip-file and --skip-function options take a regular expression and specify the files or functions that should be skipped in the results. For example:

Result of: tis-aggregate coverage convert.aggreg.index --skip-function main
File                                  Function        #reached  #statements  Ratio  Comment      
main.c            __tis_globfini  2         2            100%   total        
convert.c         convert         7         11           64%    partial      
convert.c         convert_atoi    15        19           79%    partial      
convert.c         cv_to_base      22        27           81%    partial      
convert.c         digit_name      0         5            0%     not reached  
convert.c         nb_digit        12        14           86%    partial      
TOTAL (functions)                                     5         6            83%                 
TOTAL (all statements)                                58        78           74%                 
TOTAL (stmts in reachable functions)                  58        73           79%                 

The --skip-file option may be used to exclude all the functions that come from some files. For instance:

--skip-file '^lib'
Advice about the source code

Warning

Because the coverage rely on the statement number inside the function, it is relevant only for the functions that have exactly the same statements in all the analyses, which means that, if different preprocessing options are used, they must not change that function source code.

However, some analyses may hold only part of the source code: the statements of the functions that are not in some analyses results are considered as if they were present but dead.

Despite the warning above, if some functions really need a different code for some analyses, one should follow the advice presented below to nonetheless get usable coverage results.

Conditional statements

When adding a conditional statement, a #else part with the same number of statements should be added so that the other statements of the function keep the same number across the analyses. For instance:

#ifdef TEST_1
  tis_variable_split (&x, sizeof(x), 10);
# else
  ;
#endif
  <statement>
Two versions of the same function

It is sometimes the case when two versions of the same function are needed for some analyses. In this case, instead of redefining the function like this:

#ifdef CONFIG_1
int f (...) {
  <code 1>
}
#else
int f (...) {
  <code 2>
}
#endif

It is better to define two functions to make them visible in all the analyses. The conditional code is only used to call one or the other:

int f_config_1 (...) {
  <code 1>
}
int f_other_config (...) {
  <code 2>
}
int f (void) {
#ifdef CONFIG_1
  return f_config_1 (...);
#else
  return f_other_config (...);
#endif
}
Adding Justifications

Optionally, the --justif option can be used to provide a file with some justifications for partially covered functions.

Each line of this file must be formatted as:

function_name, number of justified dead statements, justification

The free text in justification can be anything, but cannot contain a newline nor a comma (in order to produce a well-formed CSV file).

This file is useful when doing the verification several times to check whether the number of unreachable statements has changed since the last time the results have been reviewed.

For instance, for the example presented above, the justification file may look like:

File convert.justif holding some dead code justification:
main, 2, no test with wrong number of arguments.
convert, 2, no test with invalid char in second input.
digit_name, 1, always called with `x>=10`.

And then the coverage result becomes:

Result of: tis-aggregate coverage convert.aggreg.index --justif convert.justif
File                                  Function        #reached  #statements  Ratio  #justified  Ratio  Comment                                     
main.c            __tis_globfini  2         2            100%   0           100%   total                                       
convert.c         convert         9         11           82%    2           100%   no test with invalid char in second input.  
convert.c         convert_atoi    19        19           100%   0           100%   total                                       
convert.c         cv_to_base      23        27           85%    0           85%    partial                                     
convert.c         digit_name      4         5            80%    1           100%   always called with `x>=10`.                 
main.c            main            13        15           87%    2           100%   no test with wrong number of arguments.     
convert.c         nb_digit        14        14           100%   0           100%   total                                       
TOTAL (functions)                                     7         7            100%   0           100%                                               
TOTAL (all statements)                                84        93           90%    5           96%                                                
TOTAL (stmts in reachable functions)                  84        93           90%    5           96%                                                
Tis-aggregate property-status

Beside the Prepare the Analysis that shows how to run tis-analyzer commands to get Value analysis results, the ACSL Properties section shows how to add some properties in the code to solve some problems, and how some of them can be proven using the WP plugin that involves theorem prover techniques (see Prove Annotations with WP).

As pointed out by the Conclusion of the TrustInSoft Analyzer Manual, all the results obtained by the analyses rely on hypotheses, and the code annotations are a large part of them. The property-status command helps to detect the annotations that are not verified by computing the status of each property across several analyses. To get an ok status, a property has:

  • either to be valid (or dead) in every value analysis results,
  • or to be proved by WP,
  • or to be said to be manually reviewed (see About property reviews).
Index

We have seen in Basic usage that the index file holds the list of the base names of the analyses results. The property-status command also needs to know where the results come from. So an optional second column has to be added to specify the kind of analysis that gave the results. It can be value, wp or other. A missing kind is equivalent to other. The property-status command only consider the value and wp results. So the index file may look like:

path_1/proj_1
path_2/proj_2, value
path_2/proj_3, wp
...
path_n/proj_n

With the enhanced index file, the command may be run with:

$ tis-aggregate property-status appli.aggreg.index

To know more about the available property-status options, see the sections below or run:

$ tis-aggregate property-status --help
About property reviews

Some properties may be difficult to prove with the tools, so the user may prefer to justify them manually. The format of the file holding the intellectual review is not constrained, but it has to be a text file where the string [#justif-yyy:xxx] specifies that the property named xxx from the yyy function has been justified.

The file is provided to the tool with the --reviews option.

About property suffixes

Optionally, if the user follows the recommendations given in Naming the Annotations, i.e. takes care of adding a suffix to the property names, the --check-suffix option may be used to make the tool check if the computed status match the given suffix.

The default suffixes are:

  • _wp for properties that should be proved with WP (use the --wp-suffix to override it),
  • _val for properties that should be checked by the Value analysis (use the --val-suffix to override it),
  • _rv for properties that should be in the intellectual reviews (use the --review-suffix to override it),
  • the --ignore-suffix option can be used to specify a suffix to ignore, like _todo for instance.
Output

The default behavior of this command is to print the list of the properties that are not verified (according the criteria listed above). However, with the --all option, all the properties will be listed with information about their verification (and optionally information about the suffixes).

At the end of a study, it might be the case that some useless elements remain. For instance, if a property is valid in every analysis, it is useless to take some time proving it with WP. And if a property is valid in every analysis, or if it has been proved by WP, it is useless to keep a manual justification for it. When the --show-useless option is set, some warnings are generated for such useless elements and may help to clean the final results.

Tis-aggregate sql-export

As seen in the tis-info Manual, the .csv files generated when using the -info-csv-all option can be used to explore the results of an analysis. But if you are familiar with SQL, it might be easier to use it to make more elaborated requests over several .csv files.

The sql-export command generate a sqlite database:

$ tis-aggregate sql-export appli.aggreg.index name

The appli.aggreg.index file is the index file as usual, and name is one of the analysis of the index. This second argument is needed because the database is generated only for one analysis at the moment.

Use the --help option to have for more information about this command, and to see examples.

tis-info Manual

Introduction

This plug-in extracts a lot of useful information from the current project.

The information about each element is printed on a single line comma separated fields, which means that the result can be saved as a .csv file.

Theses files can be opened in a spreadsheet tool for easy selection. Moreover, it makes it easy to select some of them with grep. And if you are familiar with SQL, you can even build a sqlite database to make more elaborate requests over several .csv files (see Tis-aggregate sql-export).

To generate all the .csv files from a project, use the -info-csv-all option.

Some pieces of information are syntactic, and some others are semantic and are only available when the project holds value analysis results.

About Functions

When the -info-functions options is used, some information is printed for each function of the current project.

The following list explains which information can be found in each column:

  • Function: function name.

  • Is defined: tells whether the function has a body or if it is an external function. The possible values are:

    • defined: if the function has a defined body
    • undefined: if the function is external
  • Is static: tells whether the function is static or has a global scope. The possible values are:

    • static: if the function is static
    • global: if the function has a global scope
  • Addr used: tells whether the address of the function is used. The possible values are:

    • yes
    • no
  • Is called: tells if the function is syntactically called somewhere in the application. Notice that the entry point (main function) is considered as called. The possible values are:

    • called: if the function is syntactically called
    • not called: otherwise
  • Is reachable: tells if the function is semantically reachable from the entry point. Notice that this information is only available if the project contains value analysis results. The possible values are:

    • NA: if no value analysis results are available
    • reachable: if the function has been reached during analysis
    • unreachable: otherwise
  • Aliases: tells whether the function is aliased to other function declarations from the original source that may have disappeared from the merged file.

  • Clone origin: name of the source function if the function is a clone, ‘NA’ otherwise.

  • Spec: tells if the function has an ACSL specification. The possible values are:

    • user: if the user gave a specification
    • generated: if there is no user specification, but a specification has been generated by the tool
    • none: otherwise

    This information is mainly interesting for undefined functions.

  • val-use: tells how the function is considered by the analyses. The possible values are:

    • built-in: if a built-in is used for this function, or if it is a built-in function
    • specification: if the function is not defined or if -val-use-spec options is used to cut the analysis on this function
    • body: otherwise
  • File: gives the file in which the function is defined (or declared).

  • Starting line: line of the file where the function begins. The values are integers

  • Ending line: line of the file where the function ends. The values are integers

  • #reached: tells how many statements of the function are semantically reachable. Notice that this information is only available if the project contains value analysis results to provide reachable information. The values are integers, where -1 means not available.

  • #total: gives the number of statements of the function. The values are integers

  • coverage: gives the coverage percentage if the reachable information is available. The value is NA otherwise.

  • #called: represents how many times this function has been called during the analysis. The values are integers, where -1 means not available.

  • Overhead time: represents the time analyzing a function minus the time analyzing its statements or the associated built-in. The values are floating point numbers or 0.0 when not available.

  • Self time: represents the time analyzing a function minus the time analyzing its call statements. The values are floating point numbers or 0.0 when not available.

  • Total time: represents the time analyzing a function. The values are floating point numbers or 0.0 when not available.

  • Memexec time: time taken by memexec to check if we can reuse a previous call and to store a new call. The values are floating point numbers or 0.0 when not available.

  • Memexec hits: represents how many time previous analysis results of this function have been reused by memexec. The values are integers, where -1 means not available.

  • Is libc: tells whether the function is provided by the standard C library. The possible values are:

    • libc:no
    • libc:yes

The -info-csv-functions filename options export the same information into the given filename.

About Statements

When the -info-statements options is used, some information is printed for each statement of the current project.

The following list explains which information can be found in each column:

  • Sid: internal unique statement identifier.The values are integers

  • File: file where the statement is.

  • Line: line of the file where the statement is.The values are integers

  • Function: name of the enclosing function.

  • Number: statement number inside the function. This is the number used when printing REACHED information. The values are integers

  • Reachable: tells if the statement is reachable. Notice that this information is only available if the project contains value analysis results. The possible values are:

    • NA: if no value analysis results are available
    • reachable: if the statement has been reached during analysis
    • unreachable: otherwise
    • never-terminating: if the value analysis of this statement never terminated
    • non-terminating: if the value analysis of this statement did not terminate for some paths
    • unpropagated: if the results for that statement are incomplete due to a value analysis degeneration
    • degenerated: if the value analysis has degenerated here
  • Dead code entry: flag the statements that are dead code entry points. A dead code entry point is a dead statement that, either it has no predecessor (it is the first statement of an unreachable function, or it is syntactically an unreachable statement) or it has at least one reachable predecessor. The possible values are:

    • yes
    • no

    Notice that this information is only available if the project contains value analysis results, otherwise no statement is flagged.

  • Dead code exit: flag the statements that are dead code exit points. A dead code exit point is a dead statement that, either it has no successor (it is the last statement of an unreachable function for instance) or it has at least one reachable successor. The possible values are:

    • yes
    • no

    Notice that this information is only available if the project contains value analysis results, otherwise no statement is flagged.

  • Dead Id: give a unique id per function for each contiguous statement/block not reachable by the Value Analysis (with an exception for values 0 and -1). If the Value Analysis results are not available, all dead ids are set to -1. It allows to unambiguously refer to pieces of dead code in functions. If the function is not reachable at all by the Value Analysis, then all dead ids of this function statements are 0. If a statement is reachable by the Value Analysis, its dead ids is -1.

  • Kind: kind of the statement. The possible values are:

    • test: a test (if). See next column for more information
    • function call: a function call. See next column for more information
    • instruction: a simple instruction, such as an assignment for instance
    • inline assembly: for assembly statements
    • loop
    • return
    • jump: for statements like goto, break, continue
    • switch
    • block
    • unspecified sequence: for a sequence of instructions that comes from the decomposition of an instruction of the original program, and for which the order of evaluation in unspecified.
    • other statement
  • More: gives more information on the statement depending on its kind. The possible values are:

    • (else branch dead | then branch dead | no reduction): for a test, when the reachability has been computed, indicates if one of the two branches is alive and the other one is dead. Moreover, if the -info-stmt-no-reduction option is used, and both branches are alive, and the results are available, indicates whether the states are the same in both branches, which means that the effect of the conditional has definitely been over-approximated. Nothing is printed otherwise.
    • direct call: for a direct call
    • indirect call: for an indirect call, and gives moreover:
      • the list possibly indirectly called functions if available,
      • NA otherwise (that is: indirect but not analyzed).
    • loop n°%d in the function: loop number in the function. This information is useful when importing ACSL loop properties for instance
  • Join reason: represents why the analysis joined several propagated states. The possible values are:

    • <none>: when no joins occurred on this statement
    • merged loop: when a loop iteration analysis starts or ends with more than one propagated state (to disable this behavior, use -val-slevel-merge-after-loop=-@all)
    • reached slevel: when the slevel for this statement has been reached
    • widened: when the analyzer started to over approximate a loop (should only happen when the slevel is reached for the loop)

    Notice that several value may be used for one statement.

  • SlevelCounter: is the maximum slevel reached for this statement for all calls. In other words, it represents how many separated states went through this statement. This is an over-approximation and gets cut once it reached the slevel. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.

  • SlevelLimit: is the statement slevel limit. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.

  • Slevel ratio: is (counter * 100 / slevel). The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.

  • Widening: represents how many widening states went through this statement. There may be more than one widening state if we are in a loop and we do not find the fixpoint with the first widening iteration. The values are integers, where -1 means not available. Notice that this information is only available if the project contains value analysis results.

  • Self time: is the time spent analyzing this statement excluding potential calls. The values are floating point numbers or 0.0 when not available.

  • Self time: is the time spent analyzing this statement including potential calls. The values are floating point numbers or 0.0 when not available.

  • Is libc: tells whether the statement is provided by the standard C library. The possible values are:

    • libc:no
    • libc:yes

The -info-csv-statements filename options export the same information into the given filename.

About Files

When the -info-files options is used, some information is printed for each files of the current project.

The following list explains which information can be found in each column:

  • File: filename name.

  • #functions: number of function defined in this file. The values are integers

  • Toplevel: tells whether the file is a toplevel one or if it has been included in another one. The possible values are:

    • yes: if the file a is toplevel one
    • no: if the file is not a toplevel one, i.e. it has been included in another one
    • maybe: if a preprocessed file has been provided to the analyzer, and the file has the same basename, it can probably be a toplevel file.
  • Is libc: tells whether the file is from the tool libc or not. The possible values are:

    • no
    • yes

The -info-csv-files filename options export the same information into the given filename.

About Types

When the -info-types options is used, some information is printed for each type of the current project.

Notice that the unused types are not listed since they are automatically removed from the internal representation.

The following list explains which information can be found in each column:

  • File: gives the file of the type definition/declaration.

  • Line: gives the line of the type definition/declaration.

  • Kind: gives the type kind. The possible values are:

    • typedef
    • enum
    • struct
    • union
  • Source name: gives the name found in the source file of the type for typedef, or of the tag otherwise. Print <empty> when the tag not specified.

  • Internal name: gives the internal name if different from the source name. The value is <same> otherwise.

  • Defined: tells whether the enum/struct/union is defined, or only declared. For typedef, tells whether the underlying type is abstract (top only). The possible values are:

    • defined
    • declared
  • Sub-type: meaningful for ‘typedef’ only: show the underlying type. The value is N/A otherwise.

  • Size (#bytes): gives the size of the type, in bytes, when available. The values are integers, where -1 means not available.

  • Is libc: tells whether the type is provided by the standard C library. The possible values are:

    • libc:no
    • libc:yes

The -info-csv-types filename options export the same information into the given filename.

About Variables

When the -info-variables options is used, some information is printed for each variable of the current project.

Notice that function variables are not listed here.

The following list explains which information can be found in each column:

  • Name(s): gives the name of the variable. Also give the internal name if the variable has been renamed to avoid ambiguities.

  • File: gives the file of the variable definition/declaration. The value is no valid location otherwise.

  • Line: gives the line of the variable definition/declaration.

  • Type: gives the type kind of the variable (sub-type are not developed).

  • Function: gives the function of the variable if any. The value is NA otherwise.

  • Kind: gives the kind of the variable. The possible values are:

    • global variable
    • local variable
    • formal variable: for function formal parameters

    Warning: static local variables are marked as ‘global’ here.

  • Storage: tells if the variable is defined or just declared, and also if it is a static variable. The possible values are:

    • static
    • declared: only global variables might be only declared
    • defined
  • Initialized: tells whether the variable has an initializer or not. The possible values are:

    • no
    • yes

    Notice that this is only meaningful for global variables since local initialization are transformed into statements.

  • Volatile: tells whether the variable is volatile or not. The possible values are:

    • no
    • yes

    Notice that the qualification is detected anywhere in the type.

  • Const: tells whether the variable is const or not. The possible values are:

    • no
    • yes
  • Temporary: tells whether the variable is a temporary variables generated by the tool or a variables coming directly from user input. The possible values are:

    • no
    • yes
  • Is libc: tells whether the variable is provided by the standard C library. The possible values are:

    • libc:no
    • libc:yes

The -info-csv-variables filename options export the same information into the given filename.

About Shared Variables

When the -info-shared-variables options is used, some information is printed for each shared variable of the current project.

This table shows the access to the shared variables for each of the threads. It is only available after a Mthread analysis.

The following list explains which information can be found in each column:

  • Variable name: name of the shared variable.

  • File: file where the access takes place.

  • Line: line of the file where the access takes place.

  • Access kind: access kind. The possible values are:

    • write: when the variable may be writen by this statement.
    • read: when the variable is only read.
  • Thread: name of the thread that does the access.

  • Function: function where the access takes place.

  • Statement number: statement number inside the function. The values are integers

  • Protecting Mutexes: list of the mutex and their locking status where true means the mutex is locked and false means it may be locked or not.

The -info-csv-shared-variables filename options export the same information into the given filename.

About Properties

When the -info-properties options is used, some information is printed for each property of the current project.

The following list explains which information can be found in each column:

  • File: tells the location file of:

    • the statement of the property for a local annotation,
    • the statement of the call for an instance of a precondition,
    • the function for a function contract property,
    • the property itself otherwise.
  • Line: gives the line in the file

  • Function: gives the name of the function in which the property is if any. The value is NA otherwise.

  • StmtId: for local properties, gives the number of the statement in the function. The values are integers, where -1 means not available.

  • Kind: gives the kind of property. The possible values are:

    • differing blocks
    • division by zero
    • float to int
    • index out of bound
    • invalid shift
    • is NaN or infinite
    • logic memory access
    • memory access
    • not separated
    • overflow
    • overlap
    • pointer comparison
    • uninitialized
    • valid string
    • valid wstring
    • comparable char blocks
    • comparable wchar blocks
    • dangling contents
    • division overflow
    • function pointer
    • index in address
    • pointer arithmetic
    • unclassified
    • bad libc call
    • UB assert
    • UB precondition
    • assert
    • loop invariant
    • invariant
    • requires
    • assumes
    • ensures
    • exits
    • breaks
    • continues
    • returns
    • terminates
    • allocates
    • assigns
    • behavior: not really a property but rather a collection of properties (see ACSL manual)
    • complete behaviors
    • decrease
    • disjoint behaviors
    • from
    • axiom
    • axiomatic: not really a property but rather a collection of properties (see ACSL manual)
    • lemma
    • instance of requires: for instances of requires properties at the call statements
    • reachability: for properties about the reachability of functions and statements
    • other: for other properties
  • Name: gives the name of the property if any. The value is <empty string> otherwise. When adding properties, it is very recommended to give them names to easily identify them.

  • Status: gives the verification status of the property. The possible values are:

    • surely valid: the status is True and the dependencies have the consolidated status
      surely_valid or considered_valid.
    • considered valid: there is no possible way to prove the property (e.g., the post-condition of an external function). We assume this property will be validated by external means.
    • valid under hypothesis: the local status is True but at least one of the dependencies has a consolidated status unknown. This is typical of proofs in progress.
    • valid but dead: the status is True, but the property is in a dead or incoherent branch.
    • unknown: the status is Maybe.
    • unknown but dead: the local status is Maybe, but the property is in a dead or incoherent branch.
    • surely invalid: the status is False,
      and the dependencies the consolidated have status surely_valid.
    • invalid under hypothesis: when the local status is False,
      but at least one of the dependencies has status unknown. This is a sign of a dead code property, or of an erroneous annotation.
    • invalid but dead: the local status is False,
      but the property is in a dead or incoherent branch.
    • inconsistent: there exists two conflicting consolidated statuses for the same property, for instance with values surely_valid and surely_invalid. This case may also arise when an invalid cyclic proof is detected. This is symptomatic of an incoherent axiomatization.
    • never tried: no status is available for the property.

    See the user manual for more information about the status.

  • Alarm: tells whether the property is a generated alarm or not. The possible values are:

    • yes
    • no
  • Alarm rank: gives the rank of the emitted alarm if the property is a Value analysis alarm. The values are integers, where -1 means not available. It is advisable to study the alarms in the order in which they have been emitted since an alarm may alter the states after it.

  • MoreInfo: gives some more information about the property. Deprecated: rather use other tables to get more information. The possible values are:

    • global
    • in unreachable function: similar to _but_dead status suffix.
    • inside -val-use-spec function: annotation are not reached in use-spec functions.
    • ensures of -val-use-spec function: post-condition of use-spec functions cannot be checked, they are used as hypotheses.
    • at unreachable statement: local annotation attached to an unreachable statement.
    • instance of builtin requires: instance of a built-in pre-condition can be safely ignored since the built-in raises alarms in cas of problems.
    • <empty string>: No additional information
  • Is libc: tells whether the property is provided by the standard C library. The possible values are:

    • libc:no
    • libc:yes

The -info-csv-properties filename options export the same information into the given filename.

Examples

Let’s examine project_properties.csv:

  • On a project with value analysis results, it could be interesting to find the unreachable properties that are not part of the library with:

    $ grep never_tried project_properties.csv | grep -v TIS_KERNEL_SHARE
    
  • To find invalid properties:

    $ grep invalid project_properties.csv
    
  • To find all the alarms generated by Value, sorted by their kind:

    $  grep -v 'alarm:(no|NA)' project_properties.csv |  sort -k5
    
  • To find the unverified user properties, sorted by function name:

    $ cat project_properties.csv | grep 'alarm:(no|NA)' | grep -v ', valid,' \
           | grep -v 'TIS_KERNEL_SHARE.*never_tried' | sort -k2
    

About Value analysis results

The -info-results option may be used to print some information about the result of a value analysis. The default is to print all the sections, but a selection can be done by using the -info-msg-key results:xxx option. To know all the possible values for xxx, run:

$ tis-analyzer -info-msg-key help

There is no option at the moment to print all the sections except some of them, but because each line is prefixed with its category, it is quite easy to filter out some of them. For instance:

$ tis-analyzer ... -info-results \
  | grep "^\[info:results" | grep -v "info:results:coverage"

Some -info-results-skip-xxx options are provided as well to specify elements that may be skipped in the report. For instance -info-results-skip-abort e1,..en specifies the functions that are known to exit (they do not terminate) and that do not need to be reported as elements to check. The specified expressions e may be simple function names, but also regular expressions.

The other -info-results-skip-xxx options work similarly. All of them may take a list or may be used several times on the command line. See the All Options section for more information.

The -info-json-results option may provide an output file name to which the results are exported using the JSON format presented below.

Errors

The result sections presented below, that are marked with a VERIF_ERROR tag in the results, are the most important one. All the problems listed in these sections must be solved before going further.

Section results:undef

This section reports the existence of some reachable functions with neither code nor specification.

This MUST be fixed either by providing source files with the body of these functions, or by adding them some specifications (with at least ‘assigns’ properties). When functions are listed in this section, the obtained results are probably meaningless.

Section results:degeneration

The degeneration are points where the value analysis stopped. Results are not correct for lines of code that can be reached from the degeneration point, so this MUST be fixed.

Section results:non-terminating

This section reports the other non-terminating statements.

Some abort like functions may appear here: this is normal. The application functions which are known to never terminate may be excluded using the -info-results-skip-abort option. Other cases must be examined carefully.

Section results:messages

This section reports some messages from the tool.

Some of them are purely informative, but all of them have to be carefully reviewed manually.

Section results:alarms

This section reports the generated alarms.

The alarms should be reviewed carefully, and removed if possible. Starting from the first one is a good idea, since the next ones may be due to related reasons.

More things to examine

The following sections do not report errors per se, but pieces of information that should be checked to see if everything seems normal about them.

Section results:hypotheses

This section lists the hypotheses to verify.

Beside the alarms, these properties must be proved or justified. Some of them may be proved with WP, or manually justified. If named carefully (as explained in Naming the Annotations) - for instance with wp and rv - you can exclude them from this list using the -info-results-skip-props _wp,_rv.

Section results:no-body-but-spec

This section lists the functions with no code, but specifications.

These should be library functions. Some of them may be ignored by using the -info-results-skip-lib option. The specified expression may also match the filename. For instance, with -info-results-skip-lib external_headers all the function from file in the external_headers directory are ignored.

Section results:body-but-use-spec

This section lists the functions with code, but which are nevertheless analyzed using their specification only.

These functions have a body, but the value analysis has been asked to ignore them and to use their specifications instead (with the -val-use-spec option). They are listed here as a remainder that the analysis has been done under these hypotheses, and that at the end, these specifications MUST be checked by analyzing these functions alone.

Section results:builtins

This section lists the functions from tis-analyzer libc and built-ins that have been used. These functions that come from the tool are only listed here to show which ones are used.

Section results:useless-files

This section lists the source files that contain no reachable functions.

In most cases these files can be safely removed from the analysis, unless they hold the definition (and the initialisation) of a used global variable. It is harmless to keep them though, and it can even be useful for measuring the coverage over the whole application.

Some more information

The following sections provide more information that can be interesting to look at.

Section results:coverage

This section gives an overview of the analysis coverage.

This information may be interesting to analyze at the end of the study, but can be skipped during the elaboration of the analysis when these numbers can change a lot, which can be annoying.

To examine more precisely the coverage (by function), rather use the tis-aggregate coverage command (see Tis-aggregate coverage).

Section results:precision

The -info-precision option is used to specify the level of information printed about the loss of precision during the analysis. This option takes an integer which may be:

  • 0: no information (default level),
  • 1: a summary gives the number of statements and loops where precision has been lost,
  • 2: the information is given on each statement where precision has been lost,
  • 3: 2 + the statement itself is also printed.

For a positive level, it sets -info-msg-key results:precision, and if -info-msg-key results:precision is set and the precision level is not specified, it is set to 1.

Precision may be lost when:

  • the specified slevel is reached (see -slevel option documentation),
  • states are merged in a loop (see -val-slevel-merge-after-loop option documentation),
  • a widening has been done in a loop (see -wlevel option documentation).
Section results:exits

This section shows information about the exit status of the analyzed application. It provides:

  • the reachability of the entry point return statement,
  • the value returned by the entry point (if any),
  • the number of exit calls reached during the analysis,
  • the value of the status used by exit calls (if any).

You could also be interested by the Termination information on statements to know if some more non-terminating statements can be reached.

JSON format

The file generated when using the -info-json-results option holds the same information that are presented above and respect the following format.

Warning

This is an experimental feature, and the format might change a little in the next release.

  • computation (progress of preparing and computing the value analysis)

    • type: string = error (must be solved before going further)
    • status: string, one of {OK, NOT OK} (NOT OK if a problem blocking the analysis or making its results inaccessible has been encountered, OK otherwise)
      • if NOT OK then the error field must be present
    • error: optional string, one of:
      • no input file
      • config error
      • pre-action failed (a required action before the parsing has failed)
      • parsing error
      • missing file
      • link error
      • no value analysis computation (the value analysis was not demanded, thus it was not computed at all)
      • no entry point
      • incorrect number of arguments
      • invalid user input
      • unexpected error
    • message: optional string (more information about the encountered problem, if available)
      • when error is config error or missing file, message always holds the name of the concerned file
      • when error is parsing error, message holds information about the parsing error that occurred with its location
      • when error is no entry point, message always holds the name of the expected entry point
      • when error is no input file, incorrect number of arguments, no value analysis computation, or no value analysis results, there is no message field
      • when error is link error or `unexpected error, message is optional; if present, it holds some additional information about the error
  • undefined_functions (reachable functions with neither code nor specification)

    • type: string = error (must be solved before going further)
    • status: string, one of {OK, NOT OK} (NOT OK if any reachable function with neither code no specifications has been encountered, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of objects (locations of the declarations of each reachable undefined function):
      • file: string (path to the file)
      • line: integer (line number in the file)
      • function: string (name of the function)
  • degeneration (point in the program where the value analysis stopped)

    • type: string = error (must be solved before going further)
    • status: string, one of {OK, NOT OK} (NOT OK if a degeneration has occurred, OK otherwise)
      • if NOT OK then the reason field must be present
    • reason: optional string (degeneration reason), one of:
      • user interruption
      • timeout
      • out of memory
      • stop at nth alarm
      • stop at nth garbled
      • stop at first tis check included
      • entry point with unexpected args
      • pointer library function
      • throwing library function
      • recursive call
      • recursive clone depth has reached the limit
      • writing at unknown address
      • assign with extern function
      • fail to call builtin
      • invalid call to builtin
      • cannot resolve format string
      • format string not supported
      • watchpoint
      • interpreter not a singleton
      • interpreter invalid property
      • interpreter format string percent p
      • interpreter assign in extern function
      • interpreter encounters a limitation
    • additional fields depending on reason
      • when reason is invalid call to builtin or entry point with unexpected args: or interpreter invalid property:
        • message: string (more information about the problem)
      • when reason is format string not supported:
        • may_be_invalid: boolean
      • when reason is interpreter not a singleton:
        • expression: string (the expression that cannot be evaluated to a single value)
        • imprecision: array of one object among:
          • variable: object:
            • name: string (name of the variable involved in the expression)
            • type: string (type of the variable)
            • volatile: boolean
            • undefined: boolean
          • volatile_lvalue: string (volatile sub-expression)
          • imprecise_lvalue: string (imprecise sub-expression)
          • imprecise_pointed_value: object:
            • pointer: string (pointer expression)
            • pointer_type: string (type of the pointer)
            • pointer_value: string (pointed expression)
          • imprecise_operation: object:
            • expression: string (imprecise expression)
            • type: string (type of the expression)
            • value: string (value of the expression)
            • sub-expressions: array of objects:
              • expression: string (sub-expression)
              • type: string (type of the sub-expression)
              • value: string (value of the sub-expression)
    • program_point optional object (point in the program where the degeneration occurred)
      • file: string (path to the file)
      • line: integer (line number in the file)
      • function: string (name of the function)
      • function_start_line: integer (the line number where function begins in file)
      • function_end_line: integer (the line number where function ends in file)
      • statement_kind: string, one of:
        • test
        • function call
        • instruction
        • inline assembly
        • loop
        • return
        • jump
        • switch
        • block
        • unspecified sequence
        • other statement
      • additional fields depending on statement_kind (similar to the More column of info-statements):
        • when statement_kind is loop:
          • loop_number: integer (number of the loop in the function)
        • when statement_kind is function call:
          • direct_call: boolean (is the call direct or indirect)
          • when direct_call is true:
            • called_function: string (name of the function which is called directly)
          • when direct_call is false:
            • called_functions: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not available
  • non_terminating (other non-terminating statements)

    • type: string = to verify (not an error, but needs to be checked)

    • status: string, one of {OK, NOT OK} (NOT OK if any non-terminating statement has been encountered, OK otherwise)

      • if NOT OK then the list field must be present and not empty
    • skipped: optional array of strings (names of functions specified using the -info-results-skip-abort option, known to not terminate and thus ignored here; these functions have been excluded from the list array below)

    • list: optional array of objects (details of each non-terminating statement):

      • never_terminating: boolean (true if the statement never terminates, false if it may terminate for some callstacks)

      • program_point object (point in the program where the non-terminating statement is located)

        • file: string (path to the file)

        • line: integer (line number in the file)

        • function: string (name of the function)

        • function_start_line: integer (the line number where function begins in file)

        • function_end_line: integer (the line number where function ends in file)

        • statement_kind: string, one of:

          • test
          • function call
          • instruction
          • inline assembly
          • loop
          • return
          • jump
          • switch
          • block
          • unspecified sequence
          • other statement
        • additional fields depending on statement_kind (similar to the More column of info-statements):

          • when statement_kind is loop:
            • loop_number: integer (number of the loop in the function)
          • when statement_kind is function call:
            • direct_call: boolean (is the call direct or indirect)
            • when direct_call is true:
              • called_function: string (name of the function which is called directly)
            • when direct_call is false:
              • called_functions: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not available
        • call_stacks: optional array of array of objects (list of call stacks; each call stack is described by list of objects representing the call sites). A call site has the following fields:

          • function: string (function of the call site)
          • file: string (file of the call site)
          • line: integer (line in file of the call site)

          Note: an empty array means the program point is located in the entry point of the program.

  • messages (selected log file messages)

    • type: string = to verify (not an error, but needs to be checked)
    • status: string, one of {OK, NOT OK} (NOT OK if some important messages have been emitted such as warnings or errors, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • skipped: optional array of strings (messages specified using the -info-results-skip-log option, known to be normal and thus ignored here; messages matching these regular expressions and messages concerning plugins with names matching these regular expressions have been excluded from the list array below)
    • list: optional array of objects (details of each message):
      • message: string (text of the message)
      • plugin: string (name of the plugin that emitted the message)
      • kind: string, one of:
        • warning
        • user error
        • failure
      • location: optional object (location of the problem in the source file if any)
        • file: string (path to the file)
        • line: integer (line number in the file)
  • alarms (generated alarms)

    • type: string = error (must be solved before going further)

    • status: string, one of {OK, NOT OK} (NOT OK if any alarm has been generated, OK otherwise)

      • if NOT OK then the list field must be present and not empty
    • list: optional array of objects (details of each generated alarm):

      • kind: string, one of:

        • differing blocks
        • division by zero
        • float to int
        • index out of bound
        • invalid shift
        • is NaN or infinite
        • logic memory access
        • memory access
        • not separated
        • overflow
        • overlap
        • pointer comparison
        • uninitialized
        • valid string
        • valid wstring
        • comparable char blocks
        • comparable wchar blocks
        • dangling contents
        • division overflow
        • function pointer
        • index in address
        • pointer arithmetic
        • unclassified
        • bad libc call
        • UB assert
        • UB precondition
      • additional fields depending on kind

        • when kind is unclassified:
          • description: string (description of the alarm)
        • when kind is not unclassified:
          • predicate: string (alarm ACSL predicate)
        • when kind is unclassified``or ``function pointer:
          • details: optional string (details about the alarm when available)
      • rank: optional integer (emission rank of the alarm when it is emitted by the value analysis)

      • status: string (status of the property), one of:

        • surely valid
        • considered valid
        • valid under hypothesis
        • valid but dead
        • unknown
        • unknown but dead
        • surely invalid
        • invalid under hypothesis
        • invalid but dead
        • inconsistent
        • never tried
      • program_point object (point in the program where the alarm has been generated)

        • file: string (path to the file)

        • line: integer (line number in the file)

        • function: string (name of the function)

        • function_start_line: integer (the line number where function begins in file)

        • function_end_line: integer (the line number where function ends in file)

        • statement_kind: string, one of:

          • test
          • function call
          • instruction
          • inline assembly
          • loop
          • return
          • jump
          • switch
          • block
          • unspecified sequence
          • other statement
        • additional fields depending on statement_kind (similar to the More column of info-statements):

          • when statement_kind is loop:
            • loop_number: integer (number of the loop in the function)
          • when statement_kind is function call:
            • direct_call: boolean (is the call direct or indirect)
            • when direct_call is true:
              • called_function: string (name of the function which is called directly)
            • when direct_call is false:
              • called_functions: array of strings (names of functions which may be called indirectly), may be empty if the call is not reachable or if the value of the pointer is not available
        • call_stacks: optional array of array of objects (list of call stacks; each call stack is described by list of objects representing the call sites). A call site has the following fields:

          • function: string (function of the call site)
          • file: string (file of the call site)
          • line: integer (line in file of the call site)

          Note: an empty array means the program point is located in the entry point of the program.

      • values: optional array of objects:

        • lvalue: lvalue that appears in the predicate
        • values: possible values of the lvalue
  • hypotheses (hypotheses to verify)

    • type: string = to verify (not an error, but needs to be checked)
    • status: string, one of {OK, NOT OK} (NOT OK if there are some hypotheses to verify, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • skipped: optional array of strings (names of properties specified using the -info-results-skip-props option, known to be verified elsewhere and thus ignored here; properties with matching substrings have been excluded from the list array below)
    • list: optional array of objects (details of each property which has been used as a hypothesis):
      • kind: string, one of:
        • assert
        • loop invariant
        • invariant
        • ensures
        • exits
        • breaks
        • continues
        • returns
        • terminates
        • allocates
        • assigns
        • complete behaviors
        • decrease
        • disjoint behaviors
        • from
        • axiom
        • axiomatic
        • lemma
        • instance of requires
        • other
      • name: optional string (name of the property, if any)
      • status: string (status of the property), one of:
        • surely valid
        • considered valid
        • valid under hypothesis
        • valid but dead
        • unknown
        • unknown but dead
        • surely invalid
        • invalid under hypothesis
        • invalid but dead
        • inconsistent
        • never tried
      • more_information: optional string, one of:
        • global
        • in unreachable function
        • inside -val-use-spec function
        • ensures of -val-use-spec function
        • at unreachable statement
        • instance of builtin requires
        • empty string
      • file: string (path to the file)
      • line: integer (line number in the file)
      • function: optional string (name of the function)
  • no_body_but_spec (functions with no code, but with specifications)

    • type: string = to verify (not an error, but needs to be checked)
    • status: string, one of {OK, NOT OK} (NOT OK if any function with no body but with specifictions has been encountered, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • skipped: optional array of strings (names of functions specified using the -info-results-skip-lib option, known to be library functions and thus ignored here; these functions have been excluded from the list array below)
    • list: optional array of strings (names of functions)
  • body_but_use_spec (functions with bodies, but which are analyzed using their specifications only)

    • type: string = informative (information that may be interesting to look at)
    • status: string, one of {OK, NOT OK} (NOT OK if any function which has a body has been analyzed using its specifications only, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of strings (names of functions)
  • external_variables (used external variables)

    • type: string = informative (information that may be interesting to look at)
    • status: string, one of {OK, NOT OK} (NOT OK if the value of any undefined (i.e. declared external global) variable has been used, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of objects (used external variables with their declaration location):
      • name: string (name of the variable)
      • type: string (type of the variable)
      • file: string (path to the file)
      • line: integer (line number in the file)
  • assembly (reachable assembly statements)

    • type: string = informative (information that may be interesting to look at)
    • status: string, one of {OK, NOT OK} (NOT OK if any reachable assembly statements has been encountered, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of objects (locations of the assembly statements):
      • file: string (path to the file)
      • line: integer (line number in the file)
      • function: string (name of the function)
  • builtins (used functions from the tis-analyzer libc and built-ins)

    • type: string = informative (information that may be interesting to look at)
    • status: string, one of {OK, NOT OK} (NOT OK if any function from the tis-analyzer libc or a built-in has been used, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of strings (names of functions or built-ins)
  • useless_files (source files that contain no reachable functions)

    • type: string = to verify (not an error, but needs to be checked)
    • status: string, one of {OK, NOT OK} (NOT OK if any source file containing no reachable functions has been encountered, OK otherwise)
      • if NOT OK then the list field must be present and not empty
    • list: optional array of strings (names of useless files)
  • coverage (overview of the analysis coverage)

    • type: string = informative (information that may be interesting to look at)
    • function_coverage: string, format X% (percentage of reachable functions)
    • total_statements_coverage: string, format X% (percentage of reachable statements)
    • semantic_statements_coverage: string, format X% (percentage of reachable statements in the reachable functions)
  • precision (information about the loss of precision during the analysis and its reasons, and also about the value of some level counters)

    • type: string = informative (information that may be interesting to look at)
    • status: string, one of {OK, NOT OK} (NOT OK if any precision has been lost, OK otherwise)
      • if NOT OK then at least one of the reaching_slevel_stmts, reaching_malloc_plevel_stmts, widened_loops, or merged_loops fields must be present
    • max_slevel: non-negative integer (maximum slevel that has been reached)
    • max_malloc_plevel: optional positive integer (maximum malloc-plevel that has been reached if any)
    • max_clone_depth: optional positive integer (maximum depth of the cloned recursive functions if any)
    • reaching_slevel_stmts: optional positive integer (number of statements where the slevel has been reached)
    • reaching_malloc_plevel_stmts: optional positive integer (number of statements where the malloc-plevel has been reached)
    • widened_loops: optional positive integer (number of loops where a widening has been performed)
    • merged_loops: optional positive integer (number of loops where states were merged)
  • time (optional information about the time taken by the analyzer. Available only if the -time option is set.)

    • type: string = informative (information that may be interesting to look at)
    • parsing_time: string representation of a positive float (time taken by the parsing in seconds)
    • cxx_parsing_time: optional string representation of a positive float (time taken by the C++ parsing in seconds)
    • java_parsing_time: optional string representation of a positive float (time taken by the Java parsing in seconds)
    • value_analysis_time: string representation of a positive float (time taken by the parsing in seconds)
    • total_time: string representation of a positive float (total time in seconds)
  • target (environment of the analysis: target machine, analysis entry point, program arguments)

    • type: string = informative (information that may be interesting to look at)
    • entry_point: string (name of the entry point function)
    • arguments: optional string (arguments given to the entry point with the -val-args option, if any)
    • architecture: string (target architecture, corresponding to machdep)
    • endianness: string, either little endian or big endian
  • entry_point_return: (information about the return value of the entry point function)

    • type: string = informative (information that may be interesting to look at)
    • reachable: boolean (reachability of the entry point function’s return statement)
    • value: optional string (the value returned by the entry point function, if any; present only if the return statement is reachable the entry point function does not return void, and the -no-results option has not been used for that function)
  • exit_function (information about calls to the _exit function)

    • type: string = informative (information that may be interesting to look at)
    • reachable: boolean (reachability of the _exit function’s return statement)
    • exit_return_value: optional string (possible values returned through the _exit function, only available when any call to _exit is reachable)
  • value_results (indicates whether the Value Analysis results were available to compute all other information. true if the -no-results option was not given, false otherwise.)

  • memory_usage (the peak virtual memory usage of the analyzer)

    • words: integer (size in words)
    • bytes: integer (size in bytes)
    • kbytes: integer (size in kilobytes)
    • Mbytes: integer (size in megabytes)
    • Gbytes: integer (size in gigabytes)

About the Dead code

The statement information already holds information about the dead blocks (see About Statements), but some additional information can be computed by setting some -dead-xxx options (see the DEAD CODE section in All Options). Any of these options trigger the compilation, and the generated and printed information depend on the combination of their values as explained below.

Computing dead code information

The main information computed about the dead code is to determine for each function whether:

  • it is reachable (and may have dead blocks),

  • or it is unreachable, and:

    • it cannot be called at all because there is no syntactic calls to it, and its address is not taken,
    • it may be called (and more details are given about how).

The default is to examine all function of the analyzed application (excluding the libc functions), but the -dead-functions option may be used to compute information for a function subset.

More details on the dead blocks of the reachable functions are given depending of the setting of the -dead-xxx options.

Note

What is called a dead block here is not necessarily C block. It is a set of contiguous statements that are not reachable by the Value Analysis. See the documentation of the Dead Id, Dead code entry and Dead code exit columns in About Statements.

Some of these options are presented below.

About the tests

Most of the time, the entry points of a dead block are tests that are always evaluated to the same value (either always true or always false). With the -dead-bk-with-tests option, the expression and evaluation of these tests are printed. Moreover, with the -dead-bk-with-subexp option, the possible values of each sub-expression are also printed, and if the -dead-bk-with-defs option is set, the origin of the definitions of the involved variables are also computed (this last option requires the -whole-program-graph option to be set). All these elements may help to understand why this dead block is unreachable.

About the __tis_dead__ attribute

The __tis_dead__ block attribute may be used to mark some blocks. The -dead-bk-with-tis-dead and -dead-bk-skip-tis-dead may then be used to report about those blocks or to ignore them if the attribute is used to mark expected dead blocks.

For instance, in this example, the attribute may mean that we do not care if this block is dead, since it normally should be:

if (p == NULL) { __blockattribute__(__tis_dead__);
  abort();
  }
Printing dead code information
Default output

With the -dead-compute option, the results are printed to stdout.

Output to JSON

With the -dead-json option, the same results are exporter in JSON format to the specified file.

Summary of the dead blocks

With the -dead-bk-summary option, only the list of the dead blocks (in reachable functions) are printed, with some details about each of them.

Generate justification file

When some functions cannot be called at all, a justification file can be generated by setting the -dead-justification option. This file can then be provided to tis-aggregate to justify that those functions cannot be reached.

About the Show Queries

The tis-info-show-xxx options provide ways to make queries in order to print specified things The following list gives the provided options:

  • -info-show-val: takes a list of comma separated queries, and print the value computed for each of them. A query is composed of two parts separated by a white space:

    • the first one specifies the program point. There are three ways to specify it:

      • by its internal number sid,
      • or by f@n where f is a function name, and n the number of the statement inside it,
      • or by filename:line_number.
    • the second part specifies the term to evaluate. The term is specified using the ACSL syntax, but is usually just a variable name.

    Example: -info-show-val "f@1 x","g@3 y" print the value of x before the first statement of f, and the value of y before the third statement of g.

    Notice that values are available only if the value analysis computed them.

  • -info-show-state: print the value of the computed state at the specified program point. The syntax to specify the program point is the same than for -info-show-val.

  • -info-show-statements: print the selected statement for the queries that involve program points.

  • -info-show-with-callstack: show the detailed values for each call stack.

All Options

This is the result of the -info-help option that gives the list of all the available options:


Plug-in name: info
Plug-in shortname: info
Description: extract many information from the current project.

Most options of the form '-info-option-name' and without any parameter
have an opposite with the name '-info-no-option-name'.

Most options of the form '-option-name' and without any parameter
have an opposite with the name '-no-option-name'.

Options taking a string as argument should preferably be written
-option-name="argument".

***** LIST OF AVAILABLE OPTIONS:

-info-exit-status   print information about the application exit status, and
                    exit points. Needs value analysis results. (opposite
                    option is -info-no-exit-status)
-info-files         print information about files from the current project.
                    (opposite option is -info-no-files)
-info-functions     print information about functions from the current
                    project. (opposite option is -info-no-functions)
-info-precision <n>  give information about loss of precision: 0 for none, 1
                    or more for more (defaults to 0)
-info-properties    print information about properties from the current
                    project. (opposite option is -info-no-properties)
-info-reachable     deprecated. Print messages about semantically reached
                    statements.This is useful to collect information to
                    compute the coverage on several analyses of the same
                    functions. (opposite option is -info-no-reachable)
-info-shared-variables  print information about the shared variables after a
                    Mthread analysis (opposite option is
                    -info-no-shared-variables)
-info-statements    print information about statements (opposite option is
                    -info-no-statements)
-info-types         print information about types (opposite option is
                    -info-no-types)
-info-variables     print information about global variables (opposite option
                    is -info-no-variables)

*** CSV GENERATION

-info-csv-all <basename>  export all the computed information in csv files
                    with filenames based on the specified basename
-info-csv-files <filename>  export information about files in the specified
                    csv file
-info-csv-functions <filename>  export information about functions in the
                    specified csv file
-info-csv-properties <filename>  export information about properties in the
                    specified csv file
-info-csv-shared-variables <filename>  export information about shared
                    variables in the specified csv file
-info-csv-statements <filename>  export information about statements in the
                    specified csv file
-info-csv-types <filename>  export information about types in the specified
                    csv file
-info-csv-variables <filename>  export information about global variables in
                    the specified csv file

*** DEAD CODE

-dead-bk-skip-tis-dead  If all the statements in a dead block are in blocks
                    marked with `__blockattribute__(__tis_dead__)`, do not
                    report this dead block in the results. (opposite option
                    is -no-dead-bk-skip-tis-dead)
-dead-bk-summary    Print a summary about dead code blocks in reachable
                    functions in CSV format. (opposite option is
                    -no-dead-bk-summary)
-dead-bk-with-defs  Print the origin of the sub-expressions of the tests
                    (only if -dead-bk-with-tests and also needs the
                    -whole-program-graph option). (opposite option is
                    -no-dead-bk-with-defs)
-dead-bk-with-subexp  Print the values of the sub-expressions of the tests
                    (only if -dead-bk-with-tests). (opposite option is
                    -no-dead-bk-with-subexp)
-dead-bk-with-tests  Print the test expressions of the dead blocks. (opposite
                    option is -no-dead-bk-with-tests)
-dead-bk-with-tis-dead  Print information about the blocks marked with
                    `__blockattribute__(__tis_dead__)`. (opposite option is
                    -no-dead-bk-with-tis-dead)
-dead-compute       Print information about dead code. See other '-dead-xxx'
                    options with -info-help. to tune the details of the
                    computation. (opposite option is -no-dead-compute)
-dead-functions <f1,...,fn>  Name of the functions to examine. Examine all of
                    them by default.
-dead-json <filename>  Compute the same results than the -dead-compute
                    option, and print them to the given file using the JSON
                    format.
-dead-justification <filename>  generate a justification file to be used with
                    tis-aggregate. Not available when using the
                    -dead-functions option.

*** GETTING INFORMATION

-info-help          help of plug-in info
-info-h             alias for option -info-help
-info-help-files    print help about files information. (opposite option is
                    -info-no-help-files)
-info-help-functions  print help about functions information. (opposite
                    option is -info-no-help-functions)
-info-help-json-results  print help about the JSON format used in the file
                    generated when using the '-info-json-results' option.
                    (opposite option is -info-no-help-json-results)
-info-help-properties  print help about properties information. (opposite
                    option is -info-no-help-properties)
-info-help-reachable  print help about getting information about reachable
                    statements. (opposite option is -info-no-help-reachable)
-info-help-shared-variables  print help about getting information about
                    shared variables. (opposite option is
                    -info-no-help-shared-variables)
-info-help-show     print more help about the 'show' queries. (opposite
                    option is -info-no-help-show)
-info-help-statements  print help about getting information about statements.
                    (opposite option is -info-no-help-statements)
-info-help-types    print help about getting information about types.
                    (opposite option is -info-no-help-types)
-info-help-variables  print help about getting information about global
                    variables. (opposite option is -info-no-help-variables)

*** OUTPUT MESSAGES

-info-debug <n>     level of debug for plug-in info (defaults to 0)
-info-msg-key <k1[,...,kn]>  enables message display for categories
                    <k1>,...,<kn>. Use -info-msg-key help to get a list of
                    available categories, and * to enable all categories
-info-verbose <n>   level of verbosity for plug-in info (defaults to 1)

*** RESULTS

-info-json-results <filename>  export the -info-results results to the given
                    file using the JSON format.
-info-results       print an overview of the most important results of an
                    analysis. (opposite option is -info-no-results)
-info-results-skip-abort <f1,..,fn>  specify a list of functions that are
                    known to not terminate (such as 'abort' functions). They
                    will be ignored in the `-info-msg-key
                    results:non-terminating` output.
-info-results-skip-lib <f1,..,fn>  specify a list of functions that are known
                    to be library functions (i.e. with no body but a
                    specification). They will be ignored in the
                    `-info-msg-key results:no-body-but-spec` output.
-info-results-skip-log <re_1,..,re_n>  specify a list of regular expressions
                    that match messages to be considered as normal. The
                    regular expressions are also compared to the plugin name
                    to be able to ignore all the messages from a given
                    plugin. They will be ignored in the `-info-msg-key
                    results:messages-terminating` output.
-info-results-skip-props <p1,..,pn>  specify a list of properties that can be
                    ignored in the `-info-msg-key results:hypotheses` output
                    because they are verified elsewhere. For instance, use
                    '-info-results-skip-props _wp,_rv' if the properties
                    verified by WP have the '_wp' suffix, and the properties
                    manually reviewed  have the '_rv' suffix.

*** SHOW QUERIES

-info-show-state <program_point>  print the value of the computed state at
                    the specified program point. The syntax to specify the
                    program point is the same than for -info-show-val.
-info-show-statements  print the selected statement for the queries that
                    involve program points. (opposite option is
                    -info-no-show-statements)
-info-show-val <"program_point term">  print the value of the given term at
                    the specified program point. More details about the
                    syntax with the -info-help-show option.
-info-show-with-callstack  show the detailed values for each call stack.
                    (opposite option is -info-no-show-with-callstack)

*** TUNING

-info-compute       do not output anything, but compute all the internal
                    tables. This is useful for a saved project to save time
                    later on when loading it. (opposite option is
                    -info-no-compute)
-info-stmt-no-reduction  in the 'more' column of the 'test' statements
                    information, add 'no reduction' if the states are the
                    same in both branches. Not enabled by default because its
                    computation is expensive. (opposite option is
                    -info-no-stmt-no-reduction)
-info-with-stat     show the columns that hold the value analysis timing
                    statistics in the results (default). It is useful to
                    unset this option when the goal is to compare the results
                    between several analyses because the timing may vary even
                    when the other results remain the same. (set by default,
                    opposite option is -info-without-stat)
-info-without-stat  opposite of option "-info-with-stat"

tis-mk-main Manual

Description

The tis-mk-main tool generates a C file containing a new entry point for a program expecting command line arguments. More precisely, the generated file contains a function tis_main calling the main function of the original program with the specified arguments.

How to use it

The command line to call the tool is composed of two parts separated by a double dash -- and so it looks like:

$ tis-mk-main <options> -- <arguments>

The options to change the default behavior are:

-o <file>
name of output file (default is tis-main-dummy.c);
-prog-name <name>
name of the original program used as argv[0] (default is UNKNOWN);
-main <function-name>
name of the main function to generate (default is tis_main);
-f
forces overriding if the output file already exists;
-with-env
generates a third argument for new main function : char** env;
-help
prints a help message.

The arguments can be listed as they would be in the command line to call the original program. Arguments beginning by a - have to be preceded by a - alone (see example below). Two more options are provided to specify some indeterminate arguments:

-tis-var <size>
generates indeterminate string argument of length between 0 and n;
-tis-cst <size>
generates indeterminate string argument of length n.

Example

For instance, following command generates a function my_main in a file new.c that calls the main function of the application to analyze with 3 arguments:

$ tis-mk-main -f -main my_main -o new.c -prog-name PROG
            -- arg1 -tis-var 2 - -opt1 -tis-cst 2 - --opt2

The generated file is:

#include "tis_builtin.h"

char* argv[6];

char argv_0[] = "PROG";
char argv_1[] = "arg1";
char argv_2[3];
char argv_3[] = "-opt1";
char argv_4[3];
char argv_5[] = "--opt2";

extern int main(int argc, char* argv[]);

void my_main() {
  tis_make_unknown(argv_2, 2);
  argv_2[2] = 0;
  if (tis_interval(0,1)) {;
    argv_4[0] = tis_char_interval(-0x7F,-1);
  } else {
    argv_4[0] = tis_char_interval(1,0x7F);
  }
  argv_4[2] = 0;
  if (tis_interval(0,1)) {;
    argv_4[1] = tis_char_interval(-0x7F,-1);
  } else {
    argv_4[1] = tis_char_interval(1,0x7F);
  }
  argv_4[2] = 0;
  argv[0] = argv_0;
  argv[1] = argv_1;
  argv[2] = argv_2;
  argv[3] = argv_3;
  argv[4] = argv_4;
  argv[5] = argv_5;
  main(6, argv);
}

Generate an Entry Point with CFP

Instead of using tis-mk-main, the Context From Precondition plug-in (CFP for short) can be used to generate an entry point from the preconditions of the function to analyze. This is useful when one wants to be sure that the analysis context is not smaller than the context specified by those preconditions (see Pre-conditions for more information).

To generate a context for a function f, the command line looks like that:

tis-analyzer <options and c files> -cfp -cfp-input f

For instance, the preconditions for f could be:

/*@ requires 0 <= size <= 100;
    requires \valid_read (in + (0..(size - 1)));
    requires \initialized (in + (0..(size - 1)));
    requires out == \null || \valid (out + (0..(size - 1)));
 */
int f (int * out, int const * in, int size);

In most cases, the files __fc_builtin.h and stdlib.h from the tool standard library are needed in order to make the CFP plug-in work correctly, so they should be included in the project.

Since CFP generates the code in a new project, the -then-last option must be used to switch to this project and be able to, for instance, run the value analysis on it:

tis-analyzer <options and c files> -cfp -cfp-input f -then-last -val

More CFP options are available, see the plug-in’s help (using the -cfp-help option) for more information.

tis-mkfs Manual

Presentation

The aim of the tis-mkfs tool is to generate C source code that represents a virtual file system. It is useful when analyzing an application that has some interactions with the file system, such as reading and writing files (including special character devices), pipes and sockets. The generated code, added to the analysis, makes it possible to analyze the behavior of an application accessing files, or using other operations related to the file system. Without tis-mkfs, the default behavior of the analyzer is to consider that the filesystem only contains /dev/stdin, /dev/stdout and /dev/stderr.

This manual explains how to use the tis-mkfs tool. It also gives the tool limitations and provides workaround solutions. Some internal details are provided to help customizing the virtual file system content and/or behavior if needed.

Quick start

Starting from a file hello.txt that holds the string Hello! the following command generates the mkfs_filesystem.c file that implements a virtual file system containing one input.txt file with the same content than hello.txt:

$ tis-mkfs -local "input.txt:hello.txt"

This is a micro application that open and read the input file:

#include <stdio.h>
int main(void)
{
  FILE *f = fopen("input.txt", "r");
  if (f) {
    int c = fgetc(f);
    if (c != EOF)
      printf("First char = '%c'\n", (char)c);
    fclose(f);
  }
}

The command to run the analyzer on this function using the generated file system is:

$ tis-analyzer -fs mkfs_filesystem.c test.c -val

As expected, the following line is part of the output:

First char = 'H'

Building the virtual file system

This section shows all the possible options that can be used to build a virtual file system with tis-mkfs.

Specification

In order to build the virtual file system, the structure and the content of the file system that needs to be seen from the analyzed application first have to be determined.

This is an example of files and directories that may be expected from the application:

 /
 ├── etc/
 │   └── profile                     # known data
 ├── home/
 │   ├── .bashrc                     # known data
 │   ├── alice/
 │   │   └── data.txt                # specified content
 │   ├── bob/
 │   │   └── .bashrc                 # unknown content
 │   └── Documents/                  # empty directory
 └── usr/
     └── share/
         └── input_file.txt          # known data

For the analysis purposes, the files may have a fully defined content, a totally unknown content, or may be partially specified (number of elements and their range).

Because this model is built to represent the entire file system expected by the application, trying to read a file that does not exist in the virtual file system is analyzed as an error, as it would be in a real file system.

Preparation

The files with a known content are created from files in the real file system. Parts of the directory organization may also be created from there.

For instance, the model above may be built from the following directories and files in the real file system:

 .
 ├── filesystem
 │   ├── etc
 │   │   └── profile                 # content for /etc/profile
 │   └── home
 │       └── .bashrc                 # content for /home/.bashrc
 ├── main_input_file.txt             # content for /usr/share/input_file.txt

In this example, the content of the virtual file /usr/share/input_file.txt is generated from the real ./main_input_file.txt file. It could also have been generated from ./filesystem/usr/share/input_file.txt.

File Generation

This section shows how to choose the options to build the tis-mkfs command to generate a C file holding the virtual file system presented above.

The first option specifies that the local directory filesystem shown above has to be duplicated in the virtual file system, in the path /:

-local "/:filesystem"

Then the file main_input_file.txt is used to give the content of /usr/share/input_file.txt in the virtual file system:

-local "/usr/share/input_file.txt:main_input_file.txt"

More generally, the -local internal_name:local_name option is used to specify that the element local_name must be seen as internal_name in the analyzed program. The internal_name name does not have to specify absolute paths, but it must match exactly the ones used in the analyzed application since there are no path translation (see Perimeter). The local_name object must exist in the real file system:

  • if it is a file, its internal representation has the same content,
  • if it is a directory, the internal_name directory is added to the internal file system together with all its hierarchy. In this case, the files have the same name in the analyzed program than on the machine, but the root is renamed internal_dir.

The empty directory /home/Documents is created with:

-add-dir "/home/Documents"

The /home/bob/.bashrc file is created with an unknown content, and the /home/bob directory is automatically generated:

-add-file "/home/bob/.bashrc"

The /home/alice/data.txt is created as a file having 100 elements between 5 and 50:

-generic "/home/alice/data.txt:100:5:50"

The special filename tis-mkfs-stdin may be used with the -local, -generic and -add-file options to specify the content of the stdin stream.

The virtual file system is generated in a main_filesystem.c output file (and a main_filesystem.h header file is also created):

-o main_filesystem.c

So the full command to generate the virtual file system of the above example in a main_filesystem.c output file is:

$ tis-mkfs -local "/:filesystem" \
        -local "/usr/share/input_file.txt:main_input_file.txt" \
        -add-dir "/home/Documents" \
        -add-file "/home/bob/.bashrc" \
        -generic "/home/alice/data.txt:100:5:50" \
        -o main_filesystem.c \
        -print

The generated file system, showed thanks to the -print option, is the one specified above.

For more options, refer to the complete list in the help message below. All the options to add elements to the file system can be used together, and several times in the same command.

Help

This is the list of all the currently available options:

$ tis-mkfs -help

Generate a virtual file system for TrustInSoft analyzer.
More information in the User Manual.

Options are:
  -local <c_name:local_name> add an element <c_name> in the virtual file system
    that is similar to the <local_name> element.
    - if <local_name> is a file, <c_name> has the same content;
    - if <local_name> is a directory, <c_name> has recursively
      the same tree structure with the same files.
  -content-as-array when some content is specified by a local file,
    generate an array in the file system model
    instead of using the tis_inject_file builtin.
  -add-file <c_name> add a file named <c_name> in the file system.
    The name may include a path to specify where to put it,
    and in that case, the intermediate directories are created.
    The content of the file is unknown.
  -add-fifo similar to -add-file, but for a fifo (named pipe).
  -add-char-device similar to -add-file, but for a character special file (device).
   The device_id is not handled at the moment. It is set to 0,
   but can be changed from the user code if needed.
  -add-dir <c_name> add a directory named <c_name> in the file system.
    The name may include a path to specify where to put it.
    It is empty except is the -add-file option is also used.
  -nb-max-created-files <n> if the analyzed program creates new files, in addition of
   the one given to tis-mkfs, the file system array must be bigger
   than the initial number of files.
   This option sets the maximum number of new files that can be created.
   (default: 100)
  -nb-max-created-dirs <n> is similar to the -nb-max-created-files option, but for directories.
   (default: 10)
  -sep <separator> to change the separator in options (default: ':')
  -o <output_filename> (default: mkfs_filesystem.c)
  -generic <c_name:size:min:max> generates a virtual file of name <c_name>
    of size <size> and containing any values in the interval [<min>;<max>].
    Both bounds must be between 0 and 255.
  -print print the generated file system to stdout
  -quiet do not print helper message to stdout
  -help  Display this list of options
  --help  Display this list of options

Analysis

This section present how to use the generated file system when analyzing an application. It also shows several customizations that may be needed for special purposes.

Notice that the default behavior of the analyzer, when tis-mkfs is not used, is to consider that the filesystem only contains /dev/stdin, /dev/stdout and /dev/stderr. The -no-fs option can be used to change this behavior and to provide no filesystem model at all to the analyzer, but beware that this may lead to imprecise results when analyzing functions related to filesystem operations.

Simple Analysis

To use tis-analyzer to analyze the studied application with the virtual file system, the generated file (named main_filesystem.c in the above example) has to be added with the -fs option. This adds the generated file to the source files to analyse, and also adds the __tis_mkfs.c file that comes with the tis-mkfs tool and that holds specific implementations for some standard library functions that are able to use the generated information. This file is located in the $INSTALL_PATH/share/tis directory (also called $TIS_SHARE).

So the command line to run the analysis has to be modified like this:

$ tis-analyzer ...  -fs main_filesystem.c
Modifying the Behavior
No Error Mode

Trying to open a file for reading if does not exist in the file system always returns an error, which is the normal behavior of the standard library functions. However these functions may also return errors that may come from the system, for example when the maximum number of open files is reached.

These errors may be ignored during the first steps of the analysis in order to first study the application in its nominal behavior. For that purpose, tis-mkfs provides a macro __TIS_MKFS_NO_ERR or the -no-fs-error option. When using it, the command line is modified with:

$ tis-analyzer ...  -D __TIS_MKFS_NO_ERR ...

or:

$ tis-analyzer … -no-fs-error …

The final analysis must be realized without defining the macro in order to check that all the possible errors are correctly handled.

Pre-Allocate Mode

tis-mkfs stores the files data in memory, and may be instructed to use either static or dynamic allocation.

By default, tis-mkfs use the __TIS_MKFS_STATIC_ALLOCATE mode which stores the file contents in one large statically allocated array. This means that the maximum memory size cannot be changed. Since it does not involve dynamic allocations, it is the easiest mode to use to get precise results.

But two other modes are provided in cases where this array is too small. They may be selected by defining one of the following macros: __TIS_MKFS_PREALLOCATE or __TIS_MKFS_DYNAMIC_ALLOCATE.

When __TIS_MKFS_PREALLOCATE is defined, the maximum size of the larger files has to be given by __mkfs_preallocate_size. This global variable of type size_t must be defined and initialized by the user in one of the source files. For instance:

#include <stdlib.h>
size_t __mkfs_preallocate_size = 100000;

Then the memory that holds the data is allocated only once for each opened file, so it may be adapted according to the needs. The analysis checks that no file grows beyond its pre-allocated size.

When __TIS_MKFS_DYNAMIC_ALLOCATE is defined, tis-mkfs uses realloc calls to expand the files content. In case of heavy use of dynamic allocations, this may lead to accuracy problems in the analyzer.

Precise File Descriptor

The opened files are identified by their file descriptor. It is important to make the analyzer compute a precise value for the file descriptors to keep the exact relation between them and the file information. It means that each call to the internal function __mkfs_get_next_file_desc should return a single value.

Because the standard specifies that, when a new file descriptor is needed, it must always be the smallest available, this function has a loop to find it. It might be needed to unroll it, or to give enough precision (slevel) to this function.

But in some cases, even if the analysis is precise, it might not be possible to have a precise value for a file descriptor when there are several paths opening and closing files. For instance, in the following example, fd2 might be 3 or 4:

// no files opened yet.
int fd1 = open("input1.txt", O_RDONLY);
//@ assert fd1 == 3;
if (c)
  close(fd1);
int fd2 = open("input2.txt", O_RDONLY);

In these case, the __TIS_MKFS_NO_CLOSE macro can be used to tell to the tool to never re-use an already used file descriptor. So no matter if fd1 is closed or not, it does not get re-used, and the analyzed value for fd2 is always 4.

Warning

When using the __TIS_MKFS_NO_CLOSE macro, it might be necessary to increase the value of __TIS_FOPEN_MAX - refer to Overloading some libc Macros; also, the following code pattern, commonly used to force a specific file descriptor value:

close(fd);
dup(otherfd, fd);

will not work, as dup will fail since fd is not reusable when __TIS_MKFS_NO_CLOSE is defined.

Modifying the Generated Information

In some cases, it might be useful to deeper customize the file system content and/or the function behavior. Some examples are given below.

The header file $TIS_SHARE/__tis_mkfs.h may be included when needed for customization as it declares internal data structures and functions. For more details, refer to the Internal Details section.

Modifying the Library

As mentioned before, the __tis_mkfs.c library file that comes with the tis-mkfs tool holds a specific implementation of some standard library functions.

Because there is always a trade-off between the precision and the generality of the analysis, some choices have had to be been made. The section Perimeter gives more information about the limitations of the implemented function.

In some cases, custom implementation of one or more of these functions might be preferred. Some macros are available to exclude the predefined functions that is just a call to an internal version of the function.

For instance, the read function is defined by:

#ifndef __TIS_USER_READ
ssize_t read(int fd, void * buf, size_t count)
{ return __mkfs_read(fd, buf, count); }
#endif // __TIS_USER_READ

To use a specific read function, one has to:

  • define the __TIS_USER_READ macro;
  • write a custom read function (for instance in a read.c file);
  • include that new file in the analysis.

So the modified command looks like:

$ tis-analyzer ... -fs -D__TIS_USER_READ read.c ...

The custom file probably needs to include __tis_mkfs.h, and the custom function may still use the standard __mkfs_read function if needed. Some other functions, such as __check_fd_file_ok_for_reading for instance, might also help writing specific implementations (see Internal Details for more information).

The functions which may be provided by the user are: access, bind, close, closedir, connect, creat, dirfd, dup, dup2, fcntl, fdopendir, fstat, ftruncate, getcwd, getegid, geteuid, getgid, getsocketname, getuid, link, lseek, lstat, mmap, msync, munmap, open, opendir, pipe, pipe2, pread, pwrite, read, readdir, readlink, recv, recvfrom, rewinddir, rmdir, seekdir, socket, stat, telldir, truncate, unlink, write.

Modifying the Generated Header File

Some of the fields in the generated structures are initialized either with information coming from the local file system, or with default values according to the generated configuration file. This header file is named after the generated C file. For instance, if the asked name is main_filesystem.c, then the configuration file is named main_filesystem.h.

Initially generated, this file can be modified, for instance to change the default permission of the virtual files. Have a look in it to see what can be customized there. It can be safely modified since it is not overridden if it exists, so the modifications are not lost in case of a regeneration.

Overloading the Generated Data

Another way to modify the generated values is to modify them from the entry point function that defines the context. It enables to modify the information for one file only, or to give a range of values and even a totally undetermined value.

For instance to analyze the application with the "input_file.txt" having both 0644 or 0600 permission, the entry point can be modified to first retrieve the file information with the __mkfs_get_file function, and to modify the corresponding field in the file array (see the Internal Details section for more information about the data structures):

#include <__tis_mkfs.h>

void main (void) {
  ...
  struct __fc_fs_file * fs = __mkfs_get_file("input_file.txt");
  if (i != -1) {
    t_mode mode1 = S_IFREG | S_IRUSR | S_IWUSR;
    t_mode mode2 = mode1   | S_IRGRP | S_IROTH;
    fs->__fc_stat->st_mode = tis_nondet(mode1, mode2);
    }
  ...
}
Overloading some libc Macros

Because tis-mkfs relies on some libc definitions (mainly from stdio.h), it can be useful to redefine some macros, such as __TIS_FOPEN_MAX which gives the value of FOPEN_MAX. According to the ISO/IEC 9899:2011 standard, this constant is:

the minimum number of files that the implementation guarantees can be open simultaneously

Because its the default value may be too low for some applications (especially when using __TIS_MKFS_NO_CLOSE as explained in Precise File Descriptor), this constant can be overridden by the user from the command line:

$ tis-analyzer ...  -D __TIS_FOPEN_MAX=100

Perimeter

As mentioned before, there is a trade-off between the precision and the generality of the analysis, and moreover, the support is only partial at the moment. In order to warn the user when some unsupported features might be needed to analyze the application, or when a choice has been make that breaks the over-approximation of the results, annotations have been added to the library. These annotations have a _mkfs_niy suffix for tis-mkfs Not Implemented Yet feature. If all these annotations are valid, it means that the library has been used within tis-mkfs perimeter.

Limitations
Testing Permissions

When a file is known in the virtual file system, the access function is implemented to test the stat information according to the uid and gid. In the __TIS_MKFS_NO_ERR mode, it returns either 0 or -1 in case of error, and in this case errno is set to EACCES. Otherwise, the behavior is over-approximated and it returns either 0 or -1 with the errno variable set to an undetermined value.

At the moment, only the permission of the file is tested, not the one of the intermediate directories of the path.

Existing files

As said before, only the files and directory that appear in the virtual file system are supposed to exist. It is not possible at the moment