Tip

Use your browser search engine (such as Ctrl+F) to search in the entire documentation.

Caution

The documentation describes all the features and tools available in the commercial version of TrustInSoft Analyzer, some of which are not available through on-line tools.

The sections of the documentation describing features and tools that are only available in the commercial version are explicitly marked.

TrustInSoft Analyzer Documentation

TrustInSoft Analyzer is an award winning static C and C++ source code analyzer that ensures the safety and security of the source code by providing mathematical/formal guarantees over program properties.

TrustInSoft Analyzer takes advantage of state-of-the-art technology to provide different sets of tools to facilitate the analysis of C and C++ programs.

Structure of the Documentation

  • The Tutorials section helps the user to get started with TrustInSoft Analyzer
  • The User Manual section provides methodologies to use TrustInSoft Analyzer
  • The Reference Manual section details exhaustively the features of TrustInSoft Analyzer
  • The Admin Manual section provides information on administrating TrustInSoft Analyzer
  • The Coding Standards section details the usage of TrustInSoft Analyzer with standard requirements
  • The Glossary section gives the definitions about the terms used in this documentation
  • The Useful Links section references external links related to TrustInSoft Analyzer

Get Started

Start with the Tutorials to learn how to use TrustInSoft Analyzer, browse the source code and the alarms in the GUI to eventually have the guarantee of the absence of Undefined Behavior.

TrustInSoft Analyzer: non-technical overview

Tutorials

This section provides step-by-step tutorials to help you get started with TrustInSoft Analyzer.

The goal of TrustInSoft Analyzer is to prevent runtime errors by analyzing all of the possible values that the variables can take at any given point in the program, in order to prove that none of the execution paths leads to a problem (such as an undefined behavior or a forbidden operation).

This verification is called the value analysis.

Note

Unlike testing or binary analysis, the value analysis provided by TrustInSoft Analyzer is exhaustive: the guarantees provided apply to all the concrete executions of the program. Even the tests with the best coverage will only test a few execution paths in a program, whereas binary analysis is strongly dependent on the compilation and execution environments. This is why static value analysis gives stronger guarantees than both testing and binary analysis.

The value analysis will try to prove all the properties needed for the code to be correct. If a property can not be automatically proved, the analyzer will emit an alarm, such as:

/*@ assert Value: division_by_zero: b ≢ 0; */

It means that, in order for the program to be correct, the value of b needs to be non-zero at the point in the execution pointed by the analyzer.

At this point there are two possibilities:

  1. There is an execution path for which b = 0 that will lead to an error or an undefined behavior.

    This means there is a bug in the program that needs to be corrected in order to ensure that the property will hold.

  2. There is no execution path for which b = 0, but the analyzer was not able to prove the validity of the property.

    This means that the analyzer over-approximates the possible values of b, and in order to make the alarm disappear, it will be necessary to guide the analyzer to be more precise on the possible values of b, and then run the analysis again.

Tip

An alarm is a property, expressed in a logic language, that needs to hold at a given point in the program in order for the program to be correct.

Getting Started

The following examples show how to use TrustInSoft Analyzer on test snippets to eventually guarantee the absence of undefined behavior for the input values of the test.

Tip

When used to verify tests, TrustInSoft Analyzer can be used with the option --interpreter so that no other special tuning is required.

This allows the analyzer to remain precise, and thus, each alarm is a true bug that needs to be fixed.

Example 1: Out of Bounds Read

The example array.c, is located in the directory /home/tis/1.45.1/C_Examples/value_tutorials/getting_started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i <= 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Open the GUI after starting the analysis with the following command:

$ tis-analyzer-gui --interpreter /home/tis/1.45.1/C_Examples/value_tutorials/getting_started/array.c
_images/tuto-c-example-array-1.png

TrustInSoft Analyzer emits the following alarm:

/*@ assert Value: index_bound: i < 5; */

Tip

Meaning of the property: for the program to be valid, the value of i must be strictly less than 5 when used to access array[i].

To see both values of i and array, right click on i and Track this term in the statement printf("array[%d] = %d\n", i, array[i]); highlighted in orange, then click on array on that same statement. To see the values at each iteration of the loop, click on Per Path and scroll down to the last raw.

_images/tuto-c-example-array-2.png

The value of i is successively equal to:

  • 0 at the first iteration of the loop,
  • 1 at the second iteration of the loop,
  • 2 at the third iteration of the loop,
  • 3 at the fourth iteration of the loop,
  • 4 at the fifth iteration of the loop,
  • 5 at the sixth iteration of the loop,
  • and array is an array local var. (int[5])

Accessing array[i] when i is equal to 5 is an access out of the bounds.

In order to continue the analysis, the program must be fixed and the analysis run again.

Note

Indeed, not all statements are analyzed, see the statements highlighted in red.

The statements after the loop are highlighted in red because they are unreachable with respect to the input values of the program. This means that none of the computed value allows to continue after the loop with a well-defined execution of the program.

Understanding the Root Cause to Fix the Program

Tip

Now that we know that when i is equal to 5 there is an access out of the bounds of array, we will ask the analyzer where does this value come from.

Click on the button Inspect i to see the last write to the left-value i according to the current statement, and the value of i at this statement.

Tip

The button Inspect i is equivalent to the following actions:

  • Right click on i and then Show Defs
  • Click on i to see the values
_images/tuto-c-example-array-3.png _images/tuto-c-example-array-4.png

In the Interactive Code, the two statements highlighted in green show the last writes to i, moreover, the Show Defs panel shows that these two write locations are the only one. The location i = 0 correspond to the declaration and initialization, while the location i++ is where the value of i is eventually 5.

We fix the program by changing the conditional condition to exit the loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i < 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Kill the current analysis, fix the source code and then run again the analysis.

Tip

When analyzing real life project, and when the source code must be fixed, it is recommended to follow the process below:

  1. Kill the current analysis
  2. Edit the source code to fix the issue
  3. Run again the analysis

However, for simple use case, such as this one, everything can be done from the GUI, following this process:

  1. Edit the source code in the Source Code panel located on the right area
  2. Click on the Save File button at the top of the same panel
  3. Click on the Parsing button in the Overview panel located at the top left area
  4. Click Value Analysis button in the same panel
_images/tuto-c-example-array-5.png

TrustInSoft Analyzer emits the same alarm, but at a different location:

/*@ assert Value: index_bound: i < 5; */

Tip

Meaning of the property: for the program to be valid, the value of i must be strictly less than 5 when used to access array[i].

To see both values of i and array, right click on i and Track this term, then click on array.

_images/tuto-c-example-array-6.png

The value of i is equal to 5, and array is an array local var. (int[5]). Accessing array[i] when i is equal to 5 is an access out of the bounds.

Once again, we fix the program to print the last element of the array, kill the current analysis and then run again the analysis.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// tis-analyzer-gui --interpreter array.c

#include <stdio.h>

int main(void)
{
    int array[5] = {0, 1, 2, 3, 4};
    int i;

    for (i = 0; i < 5; i++)
        printf("array[%d] = %d\n", i, array[i]);

    i--;
    printf("array[%d] = %d\n", i, array[i]);

    return 0;
}

Tip

TrustInSoft Analyzer guarantees the absence of undefined behavior for the input values of this test.

_images/tuto-c-example-array-7.png

Example 2: division by zero

In this example we will study the program example1.c, located in the /home/tis/1.45.1/C_Examples/TutorialExamples directory. This program performs some operations on unsigned integers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
    int sum, prod, quot, diff;

    sum = a + b;
    diff = a - b;
    prod = a * b;
    quot = a / b;
}

Start the value analysis (option -val) with the following command:

$ tis-analyzer-gui -val /home/tis/1.45.1/C_Examples/TutorialExamples/example1.c

and launch the GUI (as explained in the Getting Started section).

Your browser window should display a division by zero alarm:


_images/tis-qs_examples_example1_1.png

Selecting the alarm in the bottom panel will highlight the program point at which the alarm was raised, both in the Source Code Window (right panel) and in the Interactive Code Window, where you can also see the ACSL assertion generated by the analyzer:

/*@ assert Value: division_by_zero: b ≢ 0; */

The program takes two unsigned integer arguments a and b that are unknown at the time of the analysis. The value analysis must ensure that the program is correct for all possible values of a and b. The alarm was raised because if b is equal to zero, there will be a division by zero, which leads to undefined behavior. This means that anything can happen at this point, making the program incorrect (and dangerous to use, as this will lead to a runtime error).

Let’s modify the program by adding an if (b != 0) statement in order to perform the division only if b is non-zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
//Some integer operations on a and b
void main(unsigned int a, unsigned int b)
{
    int sum, prod, quot, diff;

    sum = a + b;
    diff = a - b;
    prod = a * b;
    if (b != 0)
        quot = a / b;
}

and launch the analysis on the modified version example1_ok.c:

$ tis-analyzer-gui -val /home/tis/1.45.1/C_Examples/TutorialExamples/example1_ok.c
_images/tis-qs_examples_example1_2.png

Congratulations! All the alarms have disappeared, so the program is guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer. This means that it will run safely, whatever the arguments provided.

You can now move to the next series of examples.

Example 3: dangling pointers

Compiling well defined, standard-compliant C or C++ source code should ensure that your program will behave as intended when executed, whatever the platform or compilation chain used.

TrustInSoft Analyzer’s static code analysis helps you produce high quality, standard-compliant code that is guaranteed to be free from a wide range of weaknesses and vulnerabilities.

The example we will examine shows how TrustInSoft Analyzer can detect dangling pointers (also known as CWE-562:Return of Stack Variable Address or CWE-416: Use After Free in the Common Weaknesses Enumeration).

This example originally appeared in a blog post by Julien Cretin and Pascal Cuoq on TrustInSoft website. We encourage you to read the whole post for more technical details and some examples of production code carrying this kind of bug.

Analyzing the source code

Let’s have a look at the program example3.c, located in the /home/tis/1.45.1/C_Examples/TutorialExamples directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* TrustInSoft Analyzer Tutorial - Example 3 */
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    {
        char a = 3;
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        char b = 4;
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }

    return 0;
}

This program prints out some text. The result should differ depending on the value of the expression p == q in line 24.

Start the value analysis with the following command:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c

and launch the GUI (as explained in the Getting Started section).

After selecting the Properties widget and the Only alarms button, you will notice that there is one alarm and some lines of dead code right after the alarm:

_images/tis-qs_examples_example3_1.png

The assertion generated shows that the analyzer has found a dangling pointer (a pointer not pointing to a valid object):

/*@ assert Value: dangling_pointer: ¬\dangling(&p); */

Let’s use the analyzer to follow the variable p through the execution, and try to find the source of the problem:

_images/tis-qs_examples_example3_2.png

The values of the variable p before and after the selected statement will be listed in the Values widget in the bottom panel. We can see that, after the initialization p = &a, the variable p holds the address of a as expected.

Selecting the expression p == q inside the if will show the value of p at this point in the program. Before the evaluation of the p == q expression p is shown as ESCAPINGADDR, meaning that it does not hold the address of a valid object anymore.

_images/tis-qs_examples_example3_3.png

The reason is quite simple: p holds the address of the local variable a, whose lifetime is limited to the block in which the variable is defined:

{
    char a = 3;
    p = &a;
    pv = (uintptr_t) p;
}

Note

As stated by clause 6.2.4:2 of the C11 standard,

If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

The pointer p refers to the variable a outside of its lifetime, resulting in undefined behavior. As every execution path results in an undefined behavior, the analysis stops.

The problem: undefined behavior

When the behavior of a program is undefined, it means that anything can happen at execution time. Some programmers seem to think that, at least in some cases, this is not a big issue: they are certainly wrong. A program invoking undefined behavior is a time bomb.

A dangling pointer is an example of undefined behavior. Let’s illustrate its consequences on our example.

First we compile the program with gcc:

$ gcc -Wall -Wextra -Wpedantic /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c -o example3

and notice that, despite all of our efforts, gcc does not issue any warnings.

Running the code will display the following text:

$./example3
Roses are red,
Violets are blue,
This poem is lame,
It doesn't even rhyme.

meaning that the condition p == q evaluated to true. This can happen if the variables a and b were allocated at the same address by the compiler (which is possible since they are never in scope at the same time).

Using different compilation options should not affect the behavior of the program, but compiling with a -O2 switch and running the program results in a different output:

$ gcc -Wall -Wextra -Wpedantic -O2 /home/tis/1.45.1/C_Examples/TutorialExamples/example3.c -o example3_opt
$./example3_opt
Roses are red,
Violets are blue,
0x7ffc9224f27e is different from 0x7ffc9224f27f
7ffc9224f27e is not the same as 7ffc9224f27f

This time the expression p == q evaluated to false because the variables a and b were allocated to different addresses. So changing the optimization level changed the behavior of our program.

In the aforementioned post you can see evidence of a third, very weird behavior, in which p == q evaluates to false, even though a and b are allocated to the same address.

The conclusion is clear, and we will state it as a general warning:

Warning

If the behavior of your program is undefined, executing the compiled code will have unpredictable results and will very likely cause runtime errors. You should always ensure that your code is well-defined, using source code analysis or other techniques.

The solution: TrustInSoft Analyzer

We have already seen how TrustInSoft Analyzer is able to find dangling pointers and other weaknesses. We will continue the analysis and correct the code in order to guarantee that there are no problems left.

To avoid the dangling pointer problem, we will define a outside the block, so that its storage and address are guaranteed by the standard throughout the whole main function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//TrustInSoft Analyzer Tutorial - Example 3_1
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    char a = 3;
    {
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        char b = 4;
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }
}

and launch the value analysis again:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3_1.c

We notice that the dangling pointer alarm regarding p has been replaced by the same alarm about q. When evaluating the expression p == q the analyzer noticed a problem with the term p and stopped the analysis, so it did not get a chance to issue the alarm about q. Now that we corrected the first problem, the analyzer gets to the term q and raises the same kind of alarm.

Selecting the variables p and q in the Values widget tab and clicking on the p == q expression will show that, before evaluation, p holds the address of a whereas q has the value ESCAPINGADDR. This means that q is a dangling pointer, as it references the address of the out-of-scope variable b:

_images/tis-qs_examples_example3_4.png

To correct the problem we will simply define b outside the block, as we did before with a:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//TrustInSoft Analyzer Tutorial - Example 3_ok
#include <stdio.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void)
{
    char *p, *q;
    uintptr_t pv, qv;
    char a = 3;
    char b = 4;
    {
        p = &a;
        pv = (uintptr_t)p;
    }
    {
        q = &b;
        qv = (uintptr_t)q;
    }
    printf("Roses are red,\nViolets are blue,\n");
    if (p == q)
        printf("This poem is lame,\nIt doesn't even rhyme.\n");
    else {
        printf("%p is different from %p\n", (void *)p, (void *)q);
        printf("%" PRIxPTR " is not the same as %" PRIxPTR "\n", pv, qv);
    }
}

and launch the analyzer one more time to verify that there are no other alarms:

$ tis-analyzer-gui -val -slevel 100 /home/tis/1.45.1/C_Examples/TutorialExamples/example3_ok.c
_images/tis-qs_examples_example3_5.png

Notice that the

printf("This poem is lame,\nIt doesn\'t even rhyme.\n");

code is marked as dead. Indeed the variables a and b need to be allocated to different addresses, as they are in scope at the same time. As a consequence, the condition p == q evaluates always to false, so the first branch of the if statement will never be executed.

Congratulations! You have successfully corrected all the bugs in example3.c. The program is now guaranteed not to be subject to any of the weaknesses covered by TrustInSoft Analyzer.

Skein Tutorial

In this example we will analyze an implementation of Skein, a cryptographic hash algorithm that was a finalist in the NIST SHA-3 competition.

We will show how to use TrustInSoft Analyzer to:

  • Get familiar with the code.
  • Search for bugs in the implementation.
  • Prove that the analyzed implementation does not contain any bugs that are in the scope of TrustInSoft Analyzer.

Tutorial Overview

Getting Familiar with Skein

In the first part of this tutorial we will use TrustInSoft Analyzer to explore the code and then launch a value analysis on the Skein implementation.

The estimated time for this lesson is less than 20 minutes.

Exploring the source code

Unlike our previous examples, the Skein implementation is composed of multiple files. All the files needed for this tutorial are located in the /home/tis/1.45.1/C_Examples/skein_verification directory.

Let’ start by listing all the files in the directory:

$ ls -l /home/tis/1.45.1/C_Examples/skein_verification
total 124
-rw-rw-r-- 1 tis tis   204 Oct  5 18:54 README
-rwxrwxr-x 1 tis tis  4984 Oct  5 18:54 SHA3api_ref.c
-rwxrwxr-x 1 tis tis  2001 Oct  5 18:54 SHA3api_ref.h
-rwxrwxr-x 1 tis tis  6141 Oct  5 18:54 brg_endian.h
-rwxrwxr-x 1 tis tis  6921 Oct  5 18:54 brg_types.h
-rw-rw-r-- 1 tis tis   524 Oct  5 18:54 main.c
-rwxrwxr-x 1 tis tis 34990 Oct  5 18:54 skein.c
-rwxrwxr-x 1 tis tis 16290 Oct  5 18:54 skein.h
-rwxrwxr-x 1 tis tis 18548 Oct  5 18:54 skein_block.c
-rwxrwxr-x 1 tis tis  7807 Oct  5 18:54 skein_debug.c
-rwxrwxr-x 1 tis tis  2646 Oct  5 18:54 skein_debug.h
-rwxrwxr-x 1 tis tis  1688 Oct  5 18:54 skein_port.h

Note

Please note that the README and the main.c files are not part of the Skein implementation. They were added for use with this tutorial.

The skein.h file is a good starting point to explore the API. We will focus our attention on the following lines of code:

typedef struct
    {
    size_t  hashBitLen;                      /* size of hash result, in bits */
    size_t  bCnt;                            /* current byte count in buffer b[] */
    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
    } Skein_Ctxt_Hdr_t;

typedef struct                               /*  256-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein_256_Ctxt_t;

typedef struct                               /*  512-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein_512_Ctxt_t;

typedef struct                               /* 1024-bit Skein hash context structure */
    {
    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
    } Skein1024_Ctxt_t;

/*   Skein APIs for (incremental) "straight hashing" */
int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);

int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);

int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);

It seems that, in order to hash a message, we’ll need to:

  • declare a variable of type Skein_256_Ctxt_t;
  • initialize it using the function Skein_256_Init;
  • pass Skein_256_Update a representation of the string;
  • call Skein_256_Final with the address of a buffer in order to write the hash value.
Writing a test driver

When confronted with the analysis of an application, the usual entry point for the analysis is its main function. Nonetheless, there are many contexts in which there will not be an obvious entry-point, for example when dealing with a library. In those cases, you will need to write a driver to test the code, or better, leverage existing tests in order to exercise the code.

As the Skein implementation includes no tests, we provide the file main.c as a test driver. It implements the steps outlined above in order to hash the message "People of Earth, your attention, please":

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"

#define HASHLEN (8)
u08b_t msg[80] = "People of Earth, your attention, please";

int main(void)
{
    u08b_t hash[HASHLEN];
    int i;

    Skein_256_Ctxt_t skein_context;

    Skein_256_Init(&skein_context, HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);

    return 0;
}

The driver also prints the contents of the hashed message in the for loop at the end of the file. Each of the printed values corresponds to the numerical value of the corresponding character in the hashed message.

We compile and run the code by executing the command:

$ gcc /home/tis/1.45.1/C_Examples/skein_verification/*.c && ./a.out

and get the following output:

215
78
61
246
0
0
0
0

Note

You might get different output because, as we will see, there is a bug in the test file.

A first value analysis

We will start by launching a simple value analysis with the following command:

$ tis-analyzer-gui -64 -val /home/tis/1.45.1/C_Examples/skein_verification/*.c

Note

The -64 option specifies that we are using a 64 bit architecture. Other possible values are -16 and -32. If no value is specified, the default target architecture will be 32 bit.

Warning

For the following analysis it is extremely important to provide the correct architecture. Please make sure you are providing the correct value.

Next, we will launch the GUI by opening the link or pointing the browser to http://localhost:8080 (the port number may change if you have other analysis running at the same time).

_images/tis-qs-skein_02.png

Once you have opened the GUI, go to the Properties widget and click on the only alarms button:

_images/tis-qs-skein_05.png

In the Properties widget we can see that there are about 20 alarms.

Note

If the analyzer finds that an error occurs for every possible execution path, then it will stop the analysis. If a problem occurs in one possible execution path, the analyzer will raise an alarm and continue the analysis on the other execution paths. This is why you can get multiple alarms.

As the analyzer is not precise enough, we will start by improving the precision of the analysis in order to reduce the number alarms. We will show how to do this in the next part of this tutorial.

Please click here to continue the tutorial.

Searching for bugs

In this second part of the Skein tutorial, we will fine-tune the value analysis and investigate the alarms raised by the analyzer in order to identify and correct bugs.

The estimated time for this lesson is 20 minutes.

Increasing the precision

Note

As we have seen in previous examples, the number of different states that can be propagated at each program point is limited by the semantic unrolling level, which can be tuned with the -slevel option. The default configuration will allow only one single state, so the analysis can become imprecise quickly. Activating the -slevel option will allow to unroll loops and separately propagate the branches of conditional statements, thus improving the precision of the analysis in most cases.

A first step to improve the precision of the analysis is to activate the -slevel option. We will try to launch the analysis with an slevel value of 100 and see if there are any improvements:

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c

Tip

100 is a good default value for the slevel because most functions terminate with less than 100 disjoint states. If the slevel is not enough, you will be able to fine-tune it later, but keep in mind that increasing the slevel can lead to slower analysis, so you will need to find a good compromise between precision and speed.

The GUI will give you information about the slevel consumed at different points in the program, making it easier to find the right value to fit your needs.

We open the GUI again


_images/tis-qs-skein_07.png

and notice that there is only one alarm left, so all the other alarms were false alarms.

Note

Unlike many other static analysis tools, TrustInSoft Analyzer never remains silent when there is a potential risk for a runtime error. As we have seen before, when the analysis is not precise enough it can produce false alarms.

On the other hand, the absence of alarms means that you have mathematical guarantees over the absence of all the flaws covered by TrustInSoft Analyzer.

Investigating the alarms

With only one alarm left, we can investigate its cause to see if it is a true positive or not.

Notice that there is some dead code that cannot be reached by any execution. In particular, this function never terminates because the return statement cannot be reached.

This means that the execution never leaves the for loop in line 19:

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);
_images/tis-qs-skein_07b.png

Clicking on the alarm in the Properties widget, we see that the alarm occurs indeed inside the loop. The annotation generated by the analyzer is:

/*@ assert Value: initialization: \initialized(&hash[i]); */

meaning that we are trying to read a value that might not be properly initialized.

Note

The for loop in the main.c program has been transformed in a while loop by the analyzer. This is part of a normalization process that transforms the program into a semantically equivalent one in order to facilitate the analysis.

Let’s explore the values inside the hash array by going to the Values widget and clicking on the hash term in the Interactive Code Window.

_images/tis-qs-skein_08.png

By clicking on the values in the before column of the Values widget,

_images/tis-qs-skein_09.png

you will have a more readable output of the contents of the hash array:

[0] ∈ {215}
[1..7] ∈ UNINITIALIZED
This begins like: "�"

The analyzer is telling us that, before entering the loop, hash[0] contains the value 215 (which it tries to interpret as a character, which gives the This begins like: "�") and all the other items in the array are UNINITIALIZED.

Reading an uninitialized value is an undefined behavior, and that is exactly what the program is doing when trying to read past the first element of the array. The analysis stops here as all the execution paths lead to undefined behavior.

It seems that the alarm is indeed a true positive, so we cannot go further in the analysis without correcting the problem.

Finding and correcting the bug

As we have uninitialized values in the array, let’s have a look at the initialization function Skein_256_Init.

You can navigate to the definition of the function by right clicking on the function name:

_images/tis-qs-skein_10b.png

This will update the Interactive Code Window with the code of the function:

_images/tis-qs-skein_10c.png

After clicking on the hashBitLen variable, we can see in the Properties widget that its value is always 8.

This value corresponds to the value of the HASHLEN macro that is passed to Skein_256_Init function in line 15 of the original code:

    Skein_256_Init(&skein_context, HASHLEN);

Note that in the Interactive Code Window this macro has been expanded to 8.

So the length of the hash is expressed in bits, and we were inadvertently asking for an 8 bit hash. This explains why only the first element of the array was initialized.

As we wanted an 8 byte hash, we can correct the problem by multiplying the value by 8 in the call to Skein_256_Init, modifying line 15 of the main.c file to look like this:

    Skein_256_Init(&skein_context, 8 * HASHLEN);

After saving the changes we run TrustInSoft Analyzer again:

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c

and open the GUI:


_images/tis-qs-skein_12.png

Congratulations! There are no more alarms, so our program is guaranteed to be free from all the kinds of bugs covered by TrustInSoft Analyzer.

The “real” hash value

We can see the value of the hash in the shell in which we executed TrustInSoft Analyzer:

_images/tis-qs-skein_13b.png

Let’s compile and run the program again to check that the results are the same:

$ gcc /home/tis/1.45.1/C_Examples/skein_verification/*.c && ./a.out

We get the following output, which is indeed the same that we get on the analyzer

224
56
146
251
183
62
26
48

Note

The results we got when compiling the program in the first part of this tutorial were different because we were compiling an ill-defined program (as it was reading uninitialized variables, which is an undefined behavior). When the behavior of a program is undefined, you can get anything as a result (in our case it was most probably garbage read from the memory, as we were accessing uninitialized values).

As the program is now guaranteed to be correct, the output should be the same on any standard-compliant installation.

The next step

Our driver tested the Skein implementation on a single message of length 80. We could modify it by testing a number of different messages, and thus gain better coverage, as normal unit tests will do. But what if we could test it on all the messages of length 80?

This is certainly impossible to achieve by the execution of any compiled test, but it is in the scope of TrustInSoft Analyzer.

In the next part of this tutorial we will show you how to generalize the test to arbitrary messages of fixed length. This will give mathematical guarantees about the behavior of the implementation on any message of the fixed length.

Please click here to continue the tutorial.

Guaranteeing the absence of bugs

The previous analysis allowed us to find a bug on the test driver. After correcting the program, the analyzer did not raise any alarms, which guaranteed the absence of a large class of bugs when hashing the 80 char message “People of Earth, your attention, please”.

In this third part of the Skein tutorial, we will generalize the result obtained in the second part to all the messages of length 80.

The estimated time for this lesson is 20 minutes.

The limits of software testing

Testing a program on a given set of test cases can allow you to identify some bugs, but it can not guarantee their absence. This is true even if you test your program on a huge amount of cases, because it is very unlikely that you will be able to cover them all.

Even in a simple example like ours (hashing a message of length 80), there are way too many test cases to consider (see the note below to get some detailed calculations).

Furthermore, even if your cases pass the test, this gives you no guarantee about future behavior: for example, if a test case results in an undefined behavior this might pass unnoticed in the test environment but trigger a bug in your client’s machine.

Note

Although the number of distinct 80 chars arrays is certainly finite, there are 28 = 256 possible values for a char variable so there are 25680 ≈ 4.6 x 10192 different arrays of 80 chars. As the number of elementary particles in the visible universe is considered to be less than 1097, we can conclude that the number of arrays of 80 chars is definitely out of scope for an exhaustive test.

The power of abstract interpretation

The value analysis performed by TrustInSoft Analyzer uses abstract interpretation techniques in order to represent the values of the terms in the program. This allows for the representation of very large or even infinite sets of values in a way that makes sense to the analyzer. That is why we are going to be able to consider all the possible values for a given variable.

If the value analysis raises no alarms, this means that you have mathematical guarantees about the correctness of the program. These guarantees are not limited to one particular execution, but extend to any execution in a standard-compliant environment (which respects all the hypothesis made during the analysis).

Note

We will use some primitives that have a meaning in the abstract state of the analyzer, but that make no sense for a compiler. For this reason, we would not be able to compile or execute the driver anymore.

One such function is tis_interval. The assignment x = tis_interval(a, b) tells the analyzer that the variable x can hold any value between a and b (both included).

Generalizing the analysis to arbitrary messages of fixed length

Our goal is to obtain a result that is valid for any given array of length 80. Let’s start by modifying the main.c program in order to tell the analyzer that each of the elements in the msg array can take any value between 0 and 255.

To accomplish this, it suffices to add a couple of lines to the main.c program

    for(i = 0; i < 80; i++)
        msg[i] = tis_interval(0, 255);

to make it look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Test driver for Skein hash function */
#include "skein.h"
#include "stdio.h"

#define HASHLEN (8)
u08b_t msg[80];

int main(void)
{
    u08b_t hash[HASHLEN];
    int i;

    for(i = 0; i < 80; i++)
        msg[i] = tis_interval(0, 255);

    Skein_256_Ctxt_t skein_context;

    Skein_256_Init(&skein_context, 8 * HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

    for (i = 0; i < HASHLEN; i++)
        printf("%d\n", hash[i]);

    return 0;
}

We then launch the value analysis again

$ tis-analyzer-gui -64 -val -slevel 100 /home/tis/1.45.1/C_Examples/skein_verification/*.c

and open the GUI:


_images/tis-qs-skein_13.png

There are no alarms in the GUI. This means that we have mathematical guarantees that the code is free from all the bugs covered by TrustInSoft Analyzer.

Congratulations! You have just proved that initializing, parsing, and hashing a message of length 80 by calling the three functions

    Skein_256_Init(&skein_context, 8 * HASHLEN);
    Skein_256_Update(&skein_context, msg, 80);
    Skein_256_Final(&skein_context, hash);

will cause no runtime errors, whatever the contents of the 80 char array msg.

Note

As we saw before, this kind of result would be impossible to obtain with case testing, as this would amount to write a program testing each one of the 25680 possible input messages.

A word on variation domains

We will end this part of the tutorial with a few words on the representation of values by the analyzer. At each program point, the analyzer determines a variation domain for each variable or expression. The value of the variable or expression is always guaranteed to be contained in the variation domain but, due to the possible loss of precision, these domains can be over-approximations.

Note

The analyzer can represent the variation domain of an integer variable in three different ways:

  • As an enumeration {v1; …; vn} (if the set is small enough).
  • As an interval [l..u] with lower bound l and upper bound u.
  • As an interval with periodicity information [l..u], r%m, that is the set of values between the lower bound l and the upper bound u, whose remainder in the Euclidean division by m is equal to r.

A -- represents the smallest value that fits within the type of a variable if it occurs in the lower bound and the biggest value if it occurs in the upper bound. For instance, [--..--] will mean [0..255] for a variable of type unsigned char.

If you go to the Values widget in the GUI and click on the msg variable in the Interactive Code Window (as shown below), you will be able to see how the analyzer represents integer values contained in the array msg of the program main.c:

_images/tis-qs-skein_14.png

The analyzer shows that, after the initialization loop, msg[0..79] [--..--]. This means that each of the values msg[0] to msg[79] can take an arbitrary value in the interval [--..--]. As pointed before, for an unsigned char [--..--] stands for any value that fits within the type, that is between 0 and 255.

This is exactly what was achieved by calling the function tis_interval(0, 255).

Analyzing C++ code

If the C++ front-end for TrustInSoft Analyzer is available, it can be used with the command tis-analyzer++. This command accepts the same arguments as tis-analyzer as well as specific options that can be listed using the command tis-analyzer++ -cxx-h. An interpreter version is also available using the command tis-analyzer++ --interpreter.

This tutorial will show you some C++ specific features of TrustInSoft Analyzer in order to better understand its output and help tracking the origin of alarms. It should be read after the Getting Started section. The examples used in this tutorial can be found in the /home/tis/1.45.1/Cxx_Examples directory.

Identifiers, constructors and calling conventions

We will start by analyzing the program matrix.cpp, which is a test over a matrix manipulation library:

int
main(void) {
    Matrix<2U, 2U> matrix_a {
        2., 1.,
        4., 2. };

    auto id = identity<2>();
    bool has_inverse = is_invertible(id);
    std::cout << "identity is inversible: " << (has_inverse ? "yes\n" : "no\n");

    Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);
    Matrix<2, 1> res = solve(matrix_b,  { 6., 10. });
    std::cout << "RESULT IS:\n" << res;

    return 0;
    (void) has_inverse;
}

Start the analysis with the following command:

$ tis-analyzer++ --interpreter -gui /home/tis/1.45.1/Cxx_Examples/matrix.cpp

and open the GUI. The Interactive Code Window should look like:

_images/tis-cxx-matrix-main.png

The first thing to notice is that some names contain characters that are forbidden in C, like Matrix<2, 2> or std::__1::ostream, and may be prefixed by .... The names of entities in tis-analyzer++ are actually mangled, the Interactive Code Window displays an unmangled version of them to be clearer. The mangled version of names can be viewed by using the option -cxx-keep-mangling, and the mangling used is close enough to existing compiler practice to be unmangled by external tools like c++filt.

When a name is long, a shortened version of it is displayed in the Interactive Code Window with ... as prefix. Clicking on this prefix will display the full qualified name, or its mangled version if the option -cxx-keep-mangling is used.

The first statement that is not a declaration is a call to the function __tis_globinit(). This function represents the dynamic initialization phase of a C++ program [1]. It contains only calls to functions with names similar to X::Y::__tis_init_Z, that are used to initialize the non-local variables X::Y::Z. Looking at the definition of the X::Y::__tis_init_Z function will lead the Source Code Window to the body of the generated function initializing the variable X::Y::Z

The first statement of the main function in the original code is:

    Matrix<2U, 2U> matrix_a {
        2., 1.,
        4., 2. };

and corresponds in the normalized code to the line:

  Matrix<2, 2>::Ctor<double, double, double, double>(& matrix_a,2.,1.,4.,2.);

Ctor is the generic name that TrustInSoft Analyzer assigns to C++ constructors. You can see that:

  • the constructor templated arguments are made explicit.
  • all initializers of the source code are passed as arguments.
  • there is an additional argument & matrix_a.

All method calls are translated to regular C function calls, and as such they receive an additional argument which stands for the this pointer. In case of constructors, this is the address of the object being initialized.

When looking at the constructor definition, you can see that it is calling the inherited constructor Matrix_base<2, 2>::Ctor<double, double, double, double> with the same arguments, except that the this pointer is shifted to its __parent__Matrix_base<2, 2, Matrix>. The corresponding part of the original code is:

        const Matrix_base<I, J, Parent> &m1,
        const Matrix_base<J, K, Parent> &m2)
{
    auto cross_product =
        [&m1, &m2] (unsigned i, unsigned j) -> double
        {

Matrix<N, M> inherits from Matrix_base<N, M, Matrix>, and its constructor only transfers its arguments to the constructor of the parent class. In tis-analyzer++, a class A inheriting from a class B is represented by a struct A containing a field struct B __parent__B. The initialization of the base B of A is translated into a call the function B::Ctor(&A.__parent__B). This structure layout can be observed in the example by looking at the definition of the type struct Matrix<2, 2>.

The next statement of the main function in the original code is:

    auto id = identity<2>();

and corresponds in the normalized code to the line

  identity<2>(& id);

The first thing to note here is that the id variable has an auto type in the original source but is declared in the normalized code as:

  struct Matrix<2, 2> id;

tis-analyzer++ makes auto types explicit, in the same way it instantiates template parameters.

Another difference is that in the normalized code the identity<2> function takes an additional argument despite being a usual function and not a method. This is a consequence of the fact that, in C++, a non-POD [2] value returned by a function may not live inside the function but inside its caller. To model this, a function returning a non-POD type receives an additional parameter which contains the address of the initialized object.

The next statement of the main function in the original code is:

    bool has_inverse = is_invertible(id);

which, when clicked on, corresponds in the normalized code to:

  {
    struct Matrix<2, 2> __tis_arg;
    _Bool tmp;
    {
      {
        Matrix<2, 2>::Ctor(& __tis_arg,(struct Matrix<2, 2> const *)(& id));
        tmp = is_invertible<2>(& __tis_arg);
      }
      has_inverse = tmp;
    }
  }

In this case, one C++ statement is translated into a block containing multiple declarations and statements. The function is_invertible<2> takes its argument by copy, as seen in its declaration:

template<unsigned N>
bool
is_invertible(Matrix <N, N> m)

and so its parameter has to be initialized with a new object. This is the purpose of the __tis_arg_* family variables. In the current case, __tis_arg is initialized by calling the copy constructor of Matrix<2, 2> with the address of id as source. Then, the address of newly built __tis_arg variable is given to the function is_invertible<2> and the block around it delimits the lifetime of __tis_arg. This is the semantics of passing arguments by copy [3].

This transformation does not happen when calling the copy constructor of Matrix<2, 2> because its argument is a reference. References are converted to pointers, so taking a reference to an object is taking its address, and accessing a reference simply dereferences the pointer.

The next interesting statement is the one at line 37 of the original source:

    Matrix<2U, 2U> matrix_b = matrix_a + (5 ^ id);

which is translated in the normalized code as:

  {
    struct Matrix<2, 2> __tis_tmp_61;
    {
      {
        operator^(& __tis_tmp_61,(double)5,
                  (struct Matrix<2, 2> const *)(& id));
      }
      ;
      ;
    }
    operator+(& matrix_b,
              (struct Matrix_base<2, 2, Matrix> const *)(& matrix_a.__parent__Matrix_base<2, 2, Matrix>),
              (struct Matrix_base<2, 2, Matrix> const *)(& __tis_tmp_61.__parent__Matrix_base<2, 2, Matrix>));
  }

Again, this statement is decomposed into a block containing multiple statements but declaring this time a variable called __tis_tmp_62. The __tis_tmp_* family of variables correspond to temporary object [4] that can be introduced by complex expressions. This temporary object is declared inside the block as its lifetime is the one of the full expression, and has to be destroyed at its end if needed.

[1]as stated in [basic.start.init].
[2]POD in the sense of [class]p10.
[3]See [expr.call]p4 and in particular: “The initialization and destruction of each parameter occurs within the context of the calling function.”
[4]as defined in [class.temporary].

User Manual

This section describes how to use TrustInSoft tools.

  • The TrustInSoft Analyzer Manual explains how to analyze an application.
  • The TrustInSoft Analyzer GUI Manual explains how to use the GUI to explore the results.
  • The tis-aggregate Manual explains how to extract information about multiple analyses (such as coverage).
  • The tis-info Manual explains how to extract many information about the analyzed source code (functions, statements, types…).
  • The tis-mk-main Manual explains how to generate a main function to be used as an entry point of the analysis when the original main function uses command line arguments.
  • The tis-mkfs Manual explains how to deal with functions that interact with the file system.
  • The tis-modifications Manual presents a tool that helps to track the source code modifications made on an application under analysis.
  • The tis-report Manual explains how to create reports of aggregated UBs and code coverage
  • The tis-prepare Manual explains how to automatically build the "files" list of a configuration file.
  • The flexnet-stats.py Manual explains how to inspect license usage statistics.

TrustInSoft Analyzer Manual

Introduction

The aim of this document is to help the user to study a C or C++ application with TrustInSoft Analyzer.

The main goal of this kind of study is to verify that none of the detected software weaknesses listed in the CWE-subset are present in the application. This document explains step by step how to achieve that goal.

The definitions of some of the words and expressions used in this document are given in the Glossary.

Analysis Overview

The main steps of an analysis are:

  • First, to prepare the source code in order to make it understandable by the tool. This mainly consists of finding out which options are used to compile it (see Prepare the Sources). At the end of this step, the tool should be able to run the analysis without errors (see Check Preparation).
  • Then, to define the perimeter to study (see Prepare the Analysis). This is a user choice depending on the part of the application that has to be verified. The goal of this step is to build the context in which the application will be analyzed.
  • To run value analysis in order to detect alarms related to the software weaknesses listed in the CWE-subset. Value analysis can be tuned to increase the precision, either globally or locally, which helps in reducing the number of alarms (see Study the Alarms).
  • Each remaining alarm must be examined to find out whether it is relevant or not (i.e. if it can actually happen or not). Some annotations can be added to help the analyzer in removing it if it appears to be a false alarm (see ACSL Properties).
  • Finally, it is important to check all the annotations as they are used as hypotheses at the previous step. Some of them can be formally proven using the WP tool that involves theorem prover techniques (see Prove Annotations with WP). The others have to be carefully reviewed since all the results of the analysis rely on these hypotheses.

Here and there, extracting information, either to understand the results or to produce a report, is also useful. This can be done by combining options and scripts. How to do it is also explained in Get the Information.

Although the tool can be used purely from the command line interface, it also provides a GUI (see the TIS GUI Manual) that is very convenient for exploring the computed results.

Prepare the Sources

The purpose of this chapter is to explain how to prepare the source code of an application before starting to analyze it. The main steps to perform are:

  • Finding out the preprocessing options.

    This step can either be manual (Manual Preparation) or automatic (Automatic Preparation).

    The manual preparation is the easiest way to start with if you already know the commands necessary to compile the source files. Otherwise, start instead with the automatic preparation.

  • Dealing with the external libraries.

Manual Preparation
Preprocessing
Usual pre-processing options

In the simplest cases, all the source files need the same preprocessing command. The default preprocessing command of tis-analyzer is:

clang -C -E -nostdinc -isystem TIS_KERNEL_SHARE/libc

Some more options can be added to this command with the -cpp-extra-args option. The whole command can also be specified directly with the -cpp-command option, for instance in order to use another preprocessor.

The -I and -isystem (to add include paths), -D (to add macro definitions), and -U (to remove macro definitions) options are provided as shortcuts to the -cpp-extra-args option.

For example, the following command can be used to run the analyzer on the f1.c, f2.c, and f3.c source files, taking the included files from the incl_dir directory:

$ tis-analyzer -I incl_dir f1.c f2.c f3.c

A specific preprocessing command can be given to a set of specific files with the option -cpp-command-file "f1.c:clang -C -E,f2.c:gcc -C -E" (or -cpp-command-file "f1.c:clang -C -E" -cpp-command-file "f2.c:gcc -C -E". More options can be added to a preprocessing command for a set of files in the same way with the option -cpp-extra-args-file "f1.c:-Idir/".

Any file not listed in -cpp-command-file (resp. -cpp-extra-args-file) will use the global command (resp. additional options) of the -cpp-command option (resp. -cpp-extra-args option).

If most of the source files need to have a specific preprocessing command, it is recommended to use the Automatic Preparation.

The exact pre-processing command in use can be shown by adding the command line option -kernel-msg-key pp when running the analyzer.

Advanced pre-processing options

In some applications, the source code is split in modules that require different preprocessing commands.

Warning

First of all, an important recommendation is to tackle the software in as small-sized chunks as possible. This makes the most of pre-processing problems go away.

If a particular source file needs a different preprocessing command, it is better to preprocess it first. The result file has to be named with a .i or .ci extension so the analyzer knows that it does not need to preprocess it. The difference between the two extensions is that the .i files are not preprocessed at all by the tool, whether the macro definitions are expanded in the annotations of the .ci files, which is most of the time the intended behavior. So except in some special cases, the .ci extension has to be preferred.

Source files and preprocessed files can be mixed in the command line. For instance, if the f3.c file needs some special options, f3.ci can be generated beforehand, and then used in the command line:

$ tis-analyzer -I incl_dir f1.c f2.c f3.ci

This will give the same result as the previous command, provided that f3.c has already been preprocessed into f3.ci.

Here is a synthetic example with two files h1.c and h2.c that use the same macro M which needs to have a different definition in each file.

File h1.c:
 int x = M;
 extern int y;

 int main(void) {
   return x + y;
 }
File h2.c:
 int y = M;

If M is supposed to be 1 in h1.c and 2 in h2.c the recommended command lines for this example are:

$ clang -C -E -nostdinc -DM=1 -o h1.tis.ci h1.c
$ clang -C -E -nostdinc -DM=2 -o h2.tis.ci h2.c

Then, the generated files can be provided to the analyzer:

$ tis-analyzer -val h1.tis.ci h2.tis.ci

And the obtained result shows that M has been correctly expanded:

...
[value] Values at end of function main:
    __retres ∈ {3}
...

In more complex cases, it is better to use the Automatic Preparation.

About Libraries

Most applications use some libraries, at least the standard libc. The analyzer needs to have information about the functions that are used by the application, at least the ones that are called in the part of it which is being studied.

For the libc library, some header files come with the tool and provide specifications to many of its functions. These header files are included by default when preprocessing source files. However, if the preprocessing is done before, the following option has to be employed in order to find the instrumented files:

-I$(tis-analyzer -print-share-path)/libc

The tool also provides implementations to some libc functions that are automatically loaded. They are either C source code or internal built-in functions. But the -no-tis-libc option may be used to completely ignore the tool’s library functions and header files. It can be useful when analyzing code with custom libc functions for instance.

Another intermediate solution is to use the --custom-libc <file> option. In that case, the given source file is analyzed before the tool runtime files. It gives the opportunity to overload some of the provided C implementations. The built-in functions cannot be individually be overloaded at the moment.

To overload some header files in case something is missing, the --custom-isystem <path> option can be used. Then the given include path is used before the tool ones. In that case, the custom headers xxx.h may include the tool headers with:

File <path>/xxx.h:
 #include <tis-kernel/libc/xxx.h>

 // some more declarations and/or specification for <xxx.h>

If other external functions are used, one has to provide some properties concerning each of them: at the minimum to specify which pieces of data can be modified by them. See Check the External Functions to know which functions have to be specified and Write a Specification to learn how to do it.

First Run

At this point, the source files and the preprocessing commands should have been retrieved. It is time to try the tool for the first time, for instance by running:

tis-analyzer -metrics <..source and preprocessed files..> <..preprocessing options>

The preprocessing options are only used when source files are provided. In complex cases, it can be easier to analyze only the already preprocessed files.

Automatic Preparation

This section describes how to automatically produce a compile_commands.json file that contains instructions on how to replay the compilation process independently of the build system.

Description of the format

A compilation database is a JSON file, which consist of an array of “command objects”, where each command object specifies one way a translation unit is compiled in the project.

Each command object contains the translation unit’s main file, the working directory where the compiler ran and the actual compile command.

See the online documentation for more information:

How to produce a compile_commands.json
  • CMake (since 2.8.5) supports generation of compilation databases for Unix Makefile builds with the option CMAKE_EXPORT_COMPILE_COMMANDS.

    Usage:

    cmake <options> -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <path-to-source>
    
  • For projects on Linux, there is an alternative to intercept compiler calls with a more generic tool called bear.

    Usage:

    bear <compilation_command>
    

Note: Starting with Ubuntu 22.04, you must use bear -- <compilation_command>. The double dash (--) indicates the end of the options specific to bear and that all that follows is part of the actual build command.

Tip

It is recommended to use bear. It can be installed with the packet manager, typically:

sudo apt install bear
Using the compile_commands.json

In order to use the produced compilation database, run TrustInSoft Analyzer with the following command:

tis-analyzer -compilation-database path/to/compile_commands.json ...

Also, if a directory is given to the -compilation-database option, it will scan and use every compile_commands.json file located in the given directory and its sub-directories.

tis-analyzer -compilation-database path/to/project ...

It is also possible to use compilation databases in a tis.config file for the analysis.

A possible generic template for the tis.config file is given below (see Configuration files for more information about tis.config files).

{
    "compilation_database":
    [
        "path/to/compile_commands.json"
    ],
    "files":
    [
        "path/to/file_1",
        "path/to/file_2",
        "path/to/file_N"
    ],
    "machdep": "gcc_x86_64",
    "main": "main",
    "val": true,
    "slevel-function":
    {
        "function_name": 10
    }
}

To use the tis.config file, run TrustInSoft Analyzer with the following command:

tis-analyzer -tis-config-load tis.config

Note

The tis.config file uses a strict syntax for JSON. A typical mistake would be to put a comma for the last line of an object, e.g. for the line "path/to/file_N", and it would lead to an error.

Check Preparation

At this point, whatever method was chosen for the preparation step, you should, for instance, be able to execute:

tis-analyzer -metrics <... arguments...>

with the appropriate arguments, the analyzer should run with no errors. Using the command tis-analyzer-gui with the same arguments starts the GUI which lets you browse through the source code, but not see the analysis results yet, since nothing has been computed at the moment.

It is often useful to save the results of an analysis with:

tis-analyzer ... -save project.state > project.log

This command puts all the messages in the file project.log and saves the state of the project itself to the file project.state, so that it can be loaded later on. For instance, we can load it now in the GUI by executing:

tis-analyzer-gui -load project.state

In case the application includes some special features (assembler code, etc.) and/or requires to be studied for a specific hardware target and/or with specific compiler options, please refer to Dealing with Special Features.

Prepare the Analysis

This chapter explains how to specify which part of the source code of an application will be studied and in which context. Moreover, it also shows how the overall goal can be split into several separate analyses if needed. The main objective is to be able the run the value analysis, implemented by the Value plug-in, in order to obtain the alarms concerning the software weaknesses listed in the CWE-subset.

Define the Perimeter

The study perimeter could be the whole program, or only some functions of a library, or a single use case scenario. Explaining how to decide which part of the source code should be studied is very difficult, since it depends a lot on the particular application, the amount of time available, and mostly on how one looks at the problem… Adopt an incremental approach: begin with a small study, in order to understand how to apply the tools in the given situation, and then enlarge the perimeter later on.

Prepare the Entry Point

In order to run a value analysis, an entry point to the program has to be provided. The body of the entry point function defines the studied perimeter. It is usually the main function which establishes the context verified by the analysis, but other functions can be used to this end as well.

  • To analyze a library function, the entry point function should build values for the given library function’s input arguments, and then call the library function itself using the arguments prepared in this way (see Write an Entry Point).
  • To analyze a scenario, preparation of input values may be needed as well, then followed by calling some functions in a sequence. This sequence of function calls defines the scenario.
  • In some cases, the actual main function of an application can be used directly. However, if it takes options and arguments, it still has to be called from an entry point that builds values for them. The tis-mk-main utility can help in doing so (see tis-mk-main Manual). Be aware though, that if main is is a complex function that parses options and needs many string manipulations, it is probably a better idea to write a smaller entry point from scratch in order to define a more precise context of analysis.

It is important to mention here the difference between dynamic test execution and static value analysis. As the code is not executed in the latter, each of the built inputs provided to the analyzed function does not need to have a single value. It means that a function taking a single integer parameter x can, for instance, be analyzed for all the possible input values, or for all the values from a given set (e.g. 3 < x < 150). So when we mention “a value” here, we do not actually mean “a single concrete value”, but rather “a set of abstract values”.

Write an Entry Point

Basically, the entry point function has to call the functions to analyze, providing them with appropriate input values (i.e. function arguments) that correspond to the studied perimeter. Some builtin-in functions are available to build these input values:

  • for an integer interval:

    x = tis_interval(l, u);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from any value between l and u (inclusive) being returned. Several other functions are also provided for other types like for instance tis_double_interval(l, u) for floating-point values, and tis_unsigned_long_long_interval(l, u) for wide integers, which behave the same way for the types double and unsigned long long.

  • to initialize addr[0 .. len-1]:

    tis_make_unknown (addr, len);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from having any arbitrary len bytes in memory starting from addr.

    The tis_make_unknown function is also useful to initialize a simple variable:

    tis_make_unknown (&x, sizeof (x));
    

    This is equivalent to x = tis_interval(l,u); when l is the minimum value of the type and u is the maximum value of a type.

  • for a non-deterministic choice between two integers:

    x = tis_nondet (a, b);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from x value being a or b. These are only two cases, but these cases combine with the other possibilities resulting from the calls to the other builtin functions.

  • for a non-deterministic choice between two pointers:

    p = tis_nondet_ptr (&x, &y);
    

    This one is similar to the previous one, but for pointers.

Example: the main function below shows a valid entry point to test a compute function that takes a buffer, its length, and a pointer to store a result:

#include <stdio.h>
#include <tis_builtin.h>

int compute (char * buf, size_t len, char * result);

int main (void) {
  char buf[100];
  tis_make_unknown (buf, 100);
  size_t len = tis_interval (0, 100);
  char x;
  char * result = tis_nondet_ptr (NULL, &x);
  int r = compute (buf, len, result);
}
  • the builtin tis_init_type can be used to initialize a simple pointer, such as int * p, or a pointer to a recursive data structure, such as struct list * p. It takes five arguments:

    tis_init_type(str_type, ptr, depth, width, valid)
    
    • the first argument const char * str_type should be a string representing a valid type of the memory to initialize.
    • the second argument void * ptr should be a pointer to the memory area to initialize.
    • the third argument unsigned long depth should be an integer that exactly mirrors the behavior of the option -context-depth during the initialization.
    • the fourth argument unsigned long width should be an integer that exactly mirrors the behavior of the option -context-width during the initialization.
    • the last argument should be either 0 or 1 that exactly mirrors the behavior of the option -context-valid-pointers during the initialization.

Example:

#include<tis_builtin.h>
struct list {
    int data;
    struct list * next;
};
int main(){
    int *p0, *p1, *p2;
    struct list * p3;
    tis_init_type("int *", &p0, 1, 1, 1);
    tis_init_type("int *", &p1, 1, 10, 1);
    tis_init_type("int *", &p2, 1, 1, 0);
    tis_init_type("struct list *", &p3, 3, 1, 1);
    tis_dump_each();
}

The code above calls tis_init_type to initialize pointers p0, p1, p2 and p3. More specifically:

  • The call tis_init_type("int *", &p0, 1, 1, 1) allocates an array of size 1 given by the width argument, initialize the array element to any possible integer: S_p0[0] [--..--], and then assign the array address to pointer p0: p0 {{ &S_p0[0] }}.
  • The call tis_init_type("int *", &p1, 1, 10, 1) allocates an array of size 10: S_p1[0..9] [--..--] and assigns its address to pointer p1: p1 {{ &S_p1[0] }}.
  • The call tis_init_type("int *", &p2, 1, 1, 0) sets the last argument to 0, which allows p2 possibly be a NULL pointer: p2 {{ NULL ; &S_p2[0] }}.
  • The call tis_init_type("struct list *", &p3, 3, 1, 1) allocates a list of length 3 (the list length corresponds to the depth argument), and assign the list head address to pointer p3.
p0 ∈ {{ &S_p0[0] }}
S_p0[0] ∈ [--..--]

p1 ∈ {{ &S_p1[0] }}
S_p1[0..9] ∈ [--..--]

p2 ∈ {{ NULL ; &S_p2[0] }}
S_p2[0] ∈ [--..--]

p3 ∈ {{ &S_p3[0] }}
S_p3[0].data ∈ [--..--]
    [0].next ∈ {{ &S_next_0_S_p3[0] }}
S_next_0_S_p3[0].data ∈ [--..--]
             [0].next ∈ {{ &S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
                      [0].next ∈ {0}

In order to obtain more details about the available functions which allow building imprecise values, refer to the Abstract values section, or browse the file:

more $(tis-analyzer -print-share-path)/tis_builtin.h
Generate an Entry Point

Some tools are also available and may help to build the entry point for specific situations (see tis-mk-main Manual).

Check the External Functions

Now, when the main entry point is ready, it is time to run the value analysis for the first time using the -val option.

An important thing to check is the nature of external functions. More precisely, to look for this message in the log file:

$ grep Neither proj.log
[kernel] warning: Neither code nor specification for function ...

This message indicates that the given function is undefined. In order to progress with the value analysis, it MUST be defined by either:

  • adding missing source files,
  • or writing C stubs,
  • or specifying it with ACSL properties.

The libc library functions should not appear in these messages since most of them are already specified in provided library files (see About Libraries).

Writing C Stubs

Writing C stubs for functions for which no code is available is the recommended way to go. The standard functions and the builtins presented above (see Write an Entry Point) may be used to abstract the implementation details.

To illustrate how to write stubs using standard functions and analyzer builtins, say that the code we want to analyse to find errors in it is the function main below, and we do not have the code for the function mystrdup.

char *mystrdup(char *s);

int main(void) {
  char c, *p;
  int x;
  p = mystrdup("abc");
  if (p)
    c = p[0];
  x = c - '0';
}

There is currently no good way to write a specification that indicates that mystrdup allocates a new block and makes it contain a 0-terminated string, but instead, the recommended method is to abstract it with a stub that may look as follows:

#include <string.h>
#include <stdlib.h>
#include <tis_builtin.h>

char *mystrdup(char *s) {
  size_t l = strlen(s);
  char *p = malloc(l+1);
  if (p) {
    tis_make_unknown(p, l);
    p[l] = 0;
  }
  return p;
}

The files can be analyzed with:

$ tis-analyzer -val -slevel 10 main.c mystrdup.c

As shown in the trace, the analyzer correctly detects that the main function may use c uninitialized:

tests/val_examples/stub_main.c:13:[kernel] warning: accessing uninitialized left-value: assert \initialized(&c);
tests/val_examples/stub_main.c:13:[kernel] warning: completely indeterminate value in c.
Specifying External Functions

When specifying an external function with the ACSL properties, only the assigns properties are mandatory: they give to the tool an over-approximation of what can be modified. However, providing also the function’s post-conditions can help the analyzer and yield more precise results (see Write a Specification).

Tune the Precision

Performing value analysis with no additional options (like in all the cases above) makes it run with a rather low precision. It should not take too long to get the results that indicate where the alarms were found. When using the GUI, the list of alarms can be selected using the Kind filter of the Properties panel, and a summary of the number of alarms can be found in the Dashboard panel.

The global precision can be changed using the -slevel n option. The greater n is, the more precise the analysis is (see About the Value Analysis for more details). These alarms which could be formally verified by increasing the precision in this way will disappear. Those which remain are the difficult part: they require further attention.

Value analysis takes longer and longer when the precision increases. Thus it can be profitable to fine tune the precision locally on certain functions in order to benefit from the higher precision level where it is advantageous (so that more alarms are formally verified) while keeping it lower where it matters less (so that the analysis runs faster).

For the same reason (fast analysis to find bugs earlier) it can also be useful to reduce (temporarily) the size of the arrays (when the source code is structured to allow this easily).

The final analysis information can be found in the Dashboard panel.

Note that at this point the goal is not to study the alarms precisely, but rather to get a rough idea of the amount of work needed in order to be able to decide which part to study.

To Split or Not to Split

The experience suggests that if the size of the analyzed source code is large and / or if there are many alarms, it is usually worthwhile to split the study into smaller, more manageable sub-components. The idea here is to write a precise specification for every sub-component and then analyze each of them independently toward its particular specification. Afterwards, the main component can be studied using those specifications instead of using directly the sub-components’ source code.

It is quite easy to decide which part should be split if some main features are identifiable and clearly match a given function. Otherwise, a first overview of the number of alarms may help to isolate a part that seems difficult for the analyzer. However, as the separated function must be specified, it is much easier if it has a small and clear interface (in order study a function, it must be called in the intended context, and this context might be difficult to build if it corresponds to a large and complex data structure).

To split the analysis one must write:

  • a main function for the main component,
  • a main function and an ACSL specification for each of the API functions which is supposed to be studied independently.

Then, when performing the analysis for the main component, the -val-use-spec option should be used in order to provide the list of the API specified functions. For each of the functions from this list the value analysis will use the function’s ACSL specifications instead of the function’s body.

For instance, the commands below can be used to split the study into the main analysis with two sub-components corresponding to the f1 and f2 functions:

$ tis-analyzer -val $SRC main.c -val-use-spec f1,f2 \
                                -acsl-import f1.acsl,f2.acsl \
                                -save project.state
$ tis-analyzer -val $SRC main_f1.c -acsl-import f1.acsl -save project_f1.state
$ tis-analyzer -val $SRC main_f2.c -acsl-import f2.acsl -save project_f2.state

In the commands above:

  • the files main.c, main_f1.c and main_f2.c should hold the entry points for the main component, and the f1 and f2 functions respectively;
  • the files f1.acsl and f2.acsl should hold the ACSL specifications of the, respectively, f1 and f2 functions (see Write a Specification to learn how to write a specification).
Multi-Analysis

There is another case where studying an entry point may require several separate analyses: when there is a parameter in the program that has to be attributed a value (e.g. using a macro) and when it is difficult to give it an arbitrary value beforehand (e.g. it is a parameter defining the size of an array). In such situations it is better to write a loop in an external script to attribute different values to that parameter and run as many analyses as necessary.

The following script runs an analysis for every N being a multiple of 8 from 16 to 128:

#/bin/bash

for N in $(seq 16 8 128) ; do
  tis-analyzer -D N=$N -val $SRC main.c
done

Of course, it supposes that N is used somewhere in main.c.

Write a Specification

Writing a specification for a function is useful in two cases:

  • When some splitting is done.

    A certain function can be studied independently, as a separate sub-component of the analysis. The function is verified toward a certain specification and then that specification can be used (instead of using directly the function’s body when analyzing function calls) in the main component’s verification (To Split or Not to Split).

  • When using some unspecified external functions.

    If an external function, that is not part of the subset of the libc library functions provided with the tool (which are already specified), is used in the program, then it needs an explicit specification. The provided specification has to indicate at least which of the concerned data may be possibly modified by the function. Pre and postconditions are not mandatory in that case.

The specification is written in ACSL and is mainly composed of:

  • preconditions (requires properties),
  • modified data descriptions (left part of assigns properties),
  • dependencies (\from part of assigns properties),
  • postconditions (ensures properties).

The ACSL properties can be either inlined directly in the source files or written in separate files and loaded (as explained in ACSL Properties). An analysis will use the specification instead of the function’s body to process a function call when either the body is not provided or an appropriate -val-use-spec option has been set in the command line.

When analyzing a function call using the specification, the tool:

  • first verifies that the pre-state satisfies the preconditions,
  • then assumes that:
    • the post-state only differs from the pre-state for the specified modified data,
    • and that the post-state satisfies the postconditions.
Pre-conditions
Pre-conditions of an External Function

In the specification of an external function the preconditions are not mandatory. If some are provided though, the analyzer checks whether they are satisfied at each call. Therefore adding preconditions in that case makes the verification stronger.

Pre-conditions of a Defined Function

When a defined function is analyzed separately from the main application, its preconditions define the context in which it is studied. This does not have to be the most general context imaginable, but it has to include at least all the possible usage contexts that can be found in the application.

For example, suppose that the function f has been selected to be studied independently, and that it takes a single parameter x. If f is always called from the application with positive values of x, it is possible to study it only in that context. This property must be then specified explicitly by the precondition:

requires r_x_pos: x > 0;

Also, the corresponding main_f function - i.e. the one that is written and used as an entry point in order to study f individually - must call f with all the positive values for x. If the preconditions specify a context smaller than the context defined implicitly by the main function, it will be detected by the analysis since some of the preconditions will be then invalid. But the opposite case (i.e. if the specified context is larger than the studied input context) would not be detected automatically.

In other words, in the example above, if main_f calls f with (x >= 0), it will be detected since (x == 0) does not satisfy the precondition. However, if it calls f only with (0 < x < 10), the precondition will be formally satisfied, but the function behavior for (x >= 10) will not be studied. If, for instance, f is then called in the application with (x == 20), the problem will not be detected since the precondition is valid for this value.

Warning

When verifying a function toward a specification that is then used to verify another component, it is very important to make sure that the context defined by the specified preconditions and the studied input context represent the same thing.

Note that:

  • in most cases each of the function’s parameters should be specified in at least one precondition;
  • if a parameter can hold any value, but has to be be initialized, the \initialized precondition should be included in the specification;
  • in case of pointers, it is important to specify whether they have to be \valid (meaning that they point to an allocated memory zone).
Modified Data and Data Dependencies

The assigns properties are composed of two parts, which specify the modified data and its dependencies:

//@ assigns <left part> \from <right part>;
Modified Data

Each assigns property specifies the modified data on the left side of the \from keyword.

The union of the left parts of all the assigns properties in a given function’s specification is an over-approximation of the data modified by this function. Hence the data that is not in this set (i.e. the set defined by the union of their left parts) is expected to have the same value in the pre-state and the post-state.

The information about the modified data is used:

  • by the WP plug-in, for any function call;
  • by the Value plug-in, when the specification of a called function is used instead of its body.
Data Dependencies

Each assigns property specifies the data dependencies on the right side of the \from keyword.

The output value of the modified data is expected to depend only on the value of its data dependencies. In other words, if the value of the dependencies is equal in two input states, then the value of the modified data should be equal in the two output states.

There are two kinds of dependencies:

  • the direct dependencies are those which are used to compute the value;
  • the indirect dependencies are those which are used to choose between the executed branches or the address of the modified data (through pointers, indexes, etc.).

The indirect dependencies have to be explicitly marked with an indirect: label. All the other dependencies are considered as direct.

Here are some examples of correctly defined dependencies:

//@ assigns \result \from a, b, indirect:c;
int f (int a, int b, int c) { return c ? a : b; }

int t[10];
//@ requires 0 <= i < 10; assigns t[..] \from t[..], a, indirect:i;
void set_array_element (int i, int a) { t[i] = a; }

The dependency information is:

  • Not used by the WP plug-in.

  • Very important for many analysis techniques that require knowledge about the data dependencies (such as Show Defs feature in GUI, slicing, etc.), but only when the function body is not used, since if the body is available the dependencies can be computed by the From plug-in.

  • Employed in the value analysis of the pointers: the output value of the modified pointers can only be among the specified direct dependencies.

    Note that an intermediate pointer is needed when a pointer is assigned to the address of a variable. This property is not valid:

    assigns p \from &x; // NOT valid.
    

    One must rather declare T * const px = &x; in the code and then write the correct property:

    assigns p \from px;
    

    It means exactly that the output value of p may be based on &x and on no other existing variables.

Remember that the assigns properties specify an over-approximation of the modified data. For instance, the following properties only say that nothing except B is modified by the function:

assigns B \from \nothing;
assigns \result \from B;

In order to specify that B is surely initialized after the function call, one has to add a post-condition:

ensures \initialized(&B);

When the function result is used to return an error status, it is often the case that the post-condition rather looks like:

ensures (\result == 0 && \initialized(&B)) || (\result < 0);
Post-conditions

It is not mandatory to specify ensures properties, neither in the case of splitting a defined function nor for specifying an external function. However, some information about the values in the returned state might be needed in the analysis of the caller function.

Post-conditions of an External Function

In the specification of an external function, the provided post-conditions cannot be checked since the source code is not available. Hence they are used as hypotheses by the analysis and cannot be formally verified themselves.

Warning

As the post-conditions of external functions cannot be verified by the tool, they must be checked with extra care!

Post-conditions of a Defined Function

If ensures properties are specified, it is usually good to keep them as simple as possible. They have to be verified during the function’s body analysis and over-specification only increases the amount of work necessary to achieve that.

Check the Coverage

Before going any further, it is often advantageous to check the code coverage in order to verify if all the dead code which exists in the application (i.e. the parts of code composed of the functions and branches that are not reachable by the analysis) is indeed intended. To learn how to do that, see Information about the Coverage.

Dead code can be spotted in the GUI by looking for statements with the red background. If some dead code seems strange, it can be explored and investigated using the value analysis results. Clicking on variables and expressions allows to inspect their computed values.

As long as the analysis did not degenerate, the code classified as dead by the tool is a conservative approximation of the actual dead code. It is guaranteed that, in the context defined by the input values, the concerned statements cannot be reached whatever happens. When relying on this guarantee, one should however keep in mind these two important assumptions it depends on: that it applies only if the analysis did not degenerate and that dead code is always considered in the context defined by the input values. This is because most of the time if some code has been marked as dead when it should not have been, the reason is actually that the context of analysis was defined too restrictively (i.e. it does not include all the input values that can happen in the real execution). Another common reason is that the analysis has simply stopped computing a given branch:

  • either because the information was too imprecise to go further (in that case a degeneration warning message has been emitted),
  • or because a fatal alarm has led to an invalid state (e.g. accessing the content of a pointer which is NULL for sure renders the state invalid).
Summary

At this point, one should have some analyses (one or several) which cover the intended parts of the code and end without any degeneration. The results most likely include alarms. The chapter Study the Alarms explains how to deal with the alarms and, before that, the chapter Get the Information explains how to extract more information from the analysis results.

In case if, due to applying some splitting, there are several analyses, there is no preferred order of the verifications. In any case however, modifying the existing specifications leads to invalidating the results obtained so far.

Caution

The tis-info tool is only available in the commercial version of TrustInSoft Analyzer.

Get the Information

This chapter explains how to extract some information from the analyzed project using the tis-info plug-in and other external scripts. The tis-info plug-in provides options to generate textual files containing information about functions, variables, properties and statements. Filters can be used to extract specific information from the these files.

Some pieces of information are purely syntactic while some others are of semantic nature. The semantic information is only available if the project which was used to generate the files holds the value analysis results.

For an exact and up-to-date description of each generated piece of information, please refer to the tis-info Manual.

Generate the CSV Files

The tis-info plug-in can be used to generate CSV files. The main options allow us to extract the information concerning:

  • functions: -info-csv-functions functions.csv
  • variables: -info-csv-variables variables.csv
  • properties: -info-csv-properties properties.csv
  • statements: -info-csv-statements statements.csv

For instance, in order to get the information about functions from a previously saved project project.state, the command line would be:

tis-analyzer -load project.state -info-csv-functions functions.csv

As mentioned before, the kind of obtained information (i.e. either purely syntactic or also semantic) will depend on whether the saved project includes the value analysis results or not.

In the generated CSV files, the information about each element is printed on a single line (with comma separated fields). Hence, the files can be opened in a spreadsheet tool for easy selection of elements. Moreover, this format can be easily grepped (i.e. filtered using the grep utility), for instance, the following command returns all the information about the function funname:

grep funname functions.csv

In order to filter on a specified column, the awk tool is also very practical. For instance, the following command returns only the lines where the word valid appears in the fifth column:

awk -F, '! ($5~"valid") {print}' properties.cvs

Also, awk can be used to easily extract only some of the columns:

awk -F, '{print $4 $5}' properties.cvs
Information about the Functions

The generated file functions.csv provides information about the functions. It contains the list of both defined and declared functions appearing in the analyzed source code, including their locations, whether they are called or not, are they reachable in the analyzed context, etc. The most useful piece of information here concerns the coverage and it is detailed just below.

Information about the Coverage
Coverage from a Single Analysis

The coverage of each function can be found in the appropriate column of the functions.csv file. Note, that this information is semantic of nature and thus only available if the value analysis results have been computed.

At this point, the internal functions are usually not interesting and they can be filtered out with:

grep -v TIS_KERNEL_SHARE

The easiest approach then might be to check first the completely unreachable functions:

grep ", unreachable,"

And the completely covered ones:

grep -v ", unreachable," | grep -v "100.0%"

Then the GUI can be used to explore the dead code of the functions that are not totally covered in order to verify if this is intended or not.

Coverage from Several Analyses

If the information about the code coverage comes from several separate analyses, the generated functions.csv file is not sufficient anymore to measure the real coverage of functions, since it represents only the results extracted from only one project out of many. Because of this issue, the tis-aggregate tool provides a coverage command to extract all the relevant information from the functions.csv files and compile it into overall coverage results that can be presented in the CSV format:

tis-aggregate coverage project.aggreg  >  coverage.csv

Here, project.aggreg is a file that gives the base name of the analyses to consider. For instance:

path_1/proj_1
path_2/proj_2
...
path_n/proj_n

The tool then process information from the path_i/proj_i_functions.csv files.

This tool also provides some more options, such as presenting the results in HTML format (see the Tis-aggregate coverage section of the Tis-aggregate Manual).

An interactive HTML report can also be generated with tis-report

MC/DC (Modified Condition/Decision Coverage)

Beside the statement coverage, MC/DC may also be interesting to evaluate. To know what it is, and how it compares to other criteria, refer to MC/DC (Modified Condition/Decision Coverage).

The evaluation of the MC/DC coverage is performed when a specific set of options is set for the analyses:

$ tis-analyzer --interpreter -whole-program-graph -mcdc \
  -info-csv-all <name> <..source files and other options..>

Among other results, it generates a <name>_decisions.csv file that hold information about the decisions (see the About the Decisions section in the Tis-info Manual).

Then, tis-aggregate has to be used as explained in the Modified condition/decision coverage section of the Tis-aggregate Manual.

Information about the Properties

The location and status of each property can be found in the properties.csv file. If the names given to the user annotations follow some naming conventions (see Naming the Annotations), it is quite easy to use grep to extract more precise information that file.

For instance, if the names of the annotations that should be proved by the WP plug-in all have a _wp suffix, it is easy to check if they are all verified with the following command:

grep "_wp," properties.csv | grep -v ", valid$"
Information about the Statements

The generated file statements.csv provides information about certain kinds of statements in the analyzed program.

For instance, it contains information about the function calls, in particular whether a specific call is direct or not. Moreover, if an indirect call has been encountered during the value analysis, it provides the list of all the possibly called functions. Extracting this information can be done with:

grep ", call," statements.csv | grep -v DIRECT

Some useful information concerning the condition values can also be found here. Especially, whether a condition happens to be always true or always false. This kind of situations is often also observable through the dead code, although not in all cases, since an if condition might be always true, but may have no else branch (which, obviously, would be dead if it existed).

Information about the Variables

The information about all the variables is available in the variables.csv generated file. The exception are the global variables which are not accessed or modified, since they are removed from the analysis results. This information can also be used, for instance, to easily find the location of the definition of a variable or to list all the static or volatile variables.

Study the Alarms

Understand and Sort the Alarms

The list of all the existing alarms is given in Value Analysis Alarms.

Most of the time understanding the meaning of alarms is relatively easy since the generated assertions, messages, and warnings tend to be quite clear. The matter that requires more effort is understanding whether a given alarm is relevant: can the problem that it indicates actually happen or not. If an alarm is false (which is in fact the most frequent case), the aim is to get rid of it: convince the tool that the corresponding problem cannot occur, so that the alarm stops being emitted. Finding out exactly where an alarm comes from is essential to this end.

False alarms are often a result of a too high level of approximation in the analysis. It is recommended to treat the alarms starting from the first one in order to detect an imprecision as soon as possible.

The list of the generated assertions can easily be extracted from the properties.csv file (see Information about the Properties). Then, for instance, these assertions can be counted in order to track the evolution of their total number during the working process. (Note, however, that this particular measure is not necessarily very pertinent, because the relation between problems and emitted alarms is not really one-to-one. Losing precision at one point of the analysis can lead to several alarms which have the same origin. Moreover, solving one problem may cause many unrelated new alarms, as several problems might have been hiding behind the solved one.)

The GUI is a good place to start studying the alarms by exploring the data values. As said before, the list of all the properties discovered during the analysis can be found in the Properties of the GUI, and there is a button which allows to select the alarms among all the properties. Start investigating with the first emitted alarm by sorting them by their emission rank.

About the Value Analysis

Understanding better how the value analysis works, and how to tune its options, helps greatly in dealing with the alarms.

Value analysis uses abstract interpretation techniques that propagate the information forward through the analyzed application’s control flow in order to compute states at each reachable program point. A state is an over-approximation of all the possible values that the variables can hold at a given program point. You can imagine a state as a mapping between the variables and sets of values (keep in mind though that in reality it is a little more complex than that). For instance, the sets of values of integer variables can be represented by integer intervals. For a detailed description of the representation of values, see Value Analysis Data Representation.

See Tune the Precision for explanations concerning tuning the precision level with the -slevel option. The precision level is related with the number of states that can be stored for each program point. The smaller this number is, the more coarse the approximation is, as more computed states are merged together.

Example:

  //@ assert 0 < x < 10;
  if (x < 5)
     y = 5;
  else
     y = 10;
L:

Computing a unique state at label L only gives that x [1..9] and y [5..10]. But if the slevel is larger, then two states can be stored at L giving exactly that either y == 5 when x [1..4] or y == 10 when x [5..9].

Notice that the assertion assert 0 < x < 10 above reduces the possible values for x. It is important to remember that this works in the same way for the assertions automatically generated from alarms. For instance, if a statement a = b / c; is reached with c [0..100], an alarm is emitted in form of an appropriate assertion:

/*@ assert Value: division_by_zero: c ≢ 0; */

Then, the analysis continues with context where c [1..100].

Propagate values separately

Beside conditions in the code that automatically split the states as above (when there is enough slevel), some builtins are available to generate more than one state.

The builtin tis_variable_split (a, sz, n); splits the state on the data which address is a and the size sz if it holds less than n values. For instance, if f returns x [1..5] in the following code, the call to tis_variable_split generates five states, one for each value of x:

int x = f();
tis_variable_split (&x, sizeof(x), 10);

Moreover, the builtin tis_interval_split(l, u) does the same thing that tis_interval(l, u) does, but it automatically causes the individual values between l and u inclusive to be propagated separately. The slevel option must then be set high enough to keep the precision of the analysis.

In the following example, since all values of n are propagated separately, the analyzer is able to guarantee that the x_pos assertion holds.

#include <tis_builtin.h>

int G;

void foo(int n)
{
    int x = G;
    /*@ assert x_pos: 1 <= x <= 10; */
}

int main(void)
{
    int n = tis_interval_split(-10, 10);
    G = n;
    if (n > 0)
        foo(n + 5);
    return 0;
}
Remove Alarms by Tuning the Analysis

The analysis runs with a precision setting called the slevel limit which indicates the maximum number of individual states the analyzer is allowed to keep separated at each analyzed statement. When this limit is reached at one particular statement, the analyzer merges states together. Capping precision in this way and merging states prevents a combinatorial explosion, while the merge itself is designed not to miss any undefined behaviors that follow.

However, in particular cases, the loss of precision caused by merging states may result in the appearance of false alarms. In such cases, it is instrumental to tune the analyzer to improve precision at critical points of the program. The basic technique for doing so is to allow the analyzer to keep states separate by increasing the slevel limit that applies to those statements. The slevel limit can be increased for the entire program, but for the purposes of tuning, it is most beneficial to tune the limit locally, typically with a per-function granularity.

Apart from manipulating the slevel limit, there are advanced techniques that provide control over how the analyzer handles states. Doing so can limit the number of produced separate states by injecting strategically-placed merges or ensuring that specific interesting states are kept separate, to ensure an improvement in precision elsewhere. Some of these techniques are also described further in this section as well.

Crucially, maintaining precision of every variable at every statement should not be a goal in itself. Analyzing the behavior of the target code in one pass for the millions of variable values that can be grouped together is how the analyzer manages to provide guarantees “for all possible input vectors” while using reasonable time and space for the analysis. Therefore, using the tuning techniques described in this section should only be applied when imprecision leads to false alarms.

Detecting precision loss

The analyzer GUI refers to the number of states generated by the analysis at a given statement as the slevel counter, which is compared against the slevel limit that applies to that statement. Note that different statements may be set to have different slevel limits (see Tuning the slevel limit).

The slevel limit for a given statement is displayed in the current function information widget. In the example below, the slevel limit of the currently selected statement is set to 200.

Current function widget

Additionally, the slevel counters and limits can be listed (and sorted) for all statements in the currently viewed function by opening the in the statements tab (see Statements).

However, more conveniently, in the GUI, statements whose slevel counter exceeded their slevel limit are indicated by a red bar in the margin of the interactive code view (see Interactive Code Panel). Similarly, statements whose slevel limit has not been exceeded, but whose slevel counter reached above 50% of their allotted limit are marked with a yellow margin bar.

The example below shows a snippet of normalized C code with margins marked in red and yellow. The yellow margin bar shows that the analyzer propagated enough states through statement j = ...; generated to reach at least 50% of its slevel limit without exceeding it. The red margin bar shows that analyzer exceeded the allowed slevel limit at the statement k = .... The statement i = ... is not marked in the margin, so its slevel counter was than 50% of its slevel.

Normalized code view with six lines, 3 declarations of variables i, j, k and 3 assignments from a call to tis_interval_split(0, 2) to i, j, k. There is a yellow bar next to the assignment to j, and a red bar next to the assignment to k.

Hovering over the margin shows the number of states at this statement—its slevel counter—and the statement’s slevel limit. Here, the statement j = ... used 3 out of its slevel limit of 5.

Normalized code view with a mouse pointer over a yellow bar in the margin showing the hover hint "Slevel counter 3/5"
Other reasons for imprecision

The analysis can also get imprecise for other reasons than reaching the defined precision level. This is especially the case when the log trace includes messages about garbled mix values. It is very likely that if such messages appear, the analysis will not produce any interesting results.

Tip

The analysis can be stopped automatically from the command line option -val-stop-at-nth-garbled.

This option can also be set from the GUI.

The analysis can also be stopped when it reaches the maximum memory consumption set (in GiB) in the environment variable TIS_KERNEL_MAX_MEM. If TIS_KERNEL_MAX_MEM is set, the analyzer becomes more conservative in its use of memory when it reaches TIS_KERNEL_MAX_MEM/2 GiB of memory, and the analysis degenerates when it reaches TIS_KERNEL_MAX_MEM GiB of memory. On a single-user TIS Box with 64GiB of memory, a good value for this variable is 62.

Tuning the slevel limit

There are two basic controls for tuning the slevel limits of the analyzer, both of which can be set via commandline options:

  • -slevel n: use n as the default global slevel limit;
  • -slevel-function f:n: use n as the slevel limit for function f (this overrides the global limit);

In the following example, the analyzer will execute with a global slevel limit of 100, which will apply to all statements in all functions except main, whose slevel limit will be set to 10.

$ tis-analyzer -val -slevel 100 -slevel-function main:10 example.c

Equivalent settings are also configurable via JSON configuration files (see Loading an analysis configuration).

Function-specific slevel values can be set via the GUI in the current function information widget:

Current function widget with a mouse pointer over a button showing the hover hint "Change slevel for this function"

Note

Since the default global slevel limit is 0, some tuning will be necessary for almost all analyzed code bases.

Tuning splitting, merging and widening behavior

There are several other options that can be used to fine tune the value analysis.

Some of them control whether states can be kept separated at function returns or loops:

  • -val-split-return-function f:n: split the return states of function f according to \result == n and \result != n;

  • -val-split-return auto: automatically split the states at the end of each function according to the function return code;

  • -val-split-return-function f:full: keeps all the computed states at the end of function f in the callers;

  • -val-slevel-merge-after-loop <f | @all> when set, the different execution paths that originate from the body of a loop are merged before entering the next execution.

    The default behavior is set to all functions (-val-slevel-merge-after-loop=@all). It can be removed for some functions (-val-slevel-merge-after-loop=@all,-f), and deactivated for all functions (-val-slevel-merge-after-loop=-@all), and activated only for some (-val-slevel-merge-after-loop=-@all,+f).

  • -wlevel n: do n loop iterations before widening.

Merging states by variable value

The analyzer can merge states on demand before analyzing specific statements based on the values of specific variables.

To induce the analyzer to merge states at some expression, add a comment to the source code that specifies how the merge should be performed:

  • //@ slevel merge merges all the states before analyzing the first statement following the comment (note there is no underscore between slevel and merge);
  • //@ slevel_merge x selectively merges all such states where variable x has the same value before analyzing the first statement following the comment (note there is an underscore between slevel and merge);
  • //@ slevel_merge x, y, selectively merges all the states that have the same value for x, and that have the same value for y, and so on.

For example, consider the following program:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    tis_show_each("i j k", i, j, k);
}

When the program is analyzed (with tis-analyzer -val -slevel 100) each of the assignments to variables i, j, and k creates two separate states. In effect the analyzer constructs eight separate states at the statement tis_show_each:

[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {1})

On the other hand, it is possible to limit the number of states by merging them according to the value of variable i by adding a comment to the tis_show_each statement:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel_merge i;
    tis_show_each("i j k", i, j, k);
}

In this case, the analyzer only produces two states at that statement, one where the value of i is 0 and another where it is 1. The values of each of the two remaining variables are merged into sets containing both the values of 0 and 1.

[value] Called tis_show_each({{ "i j k" }}, {0}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0; 1}, {0; 1})

It is also possible to limit the number of states by merging them according to the values of multiple variables. To do this, add a comment to the tis_show_each statement that merges the states based on the values of i and j taken together:

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel_merge i, j;
    tis_show_each("i j k", i, j, k);
}

This then produces four states, one for each of the four permutations of i and j, with the values of k being merged for each of those states:

[value] Called tis_show_each({{ "i j k" }}, {0}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {0}, {1}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {0}, {0; 1})
[value] Called tis_show_each({{ "i j k" }}, {1}, {1}, {0; 1})

Finally, all the states can be merged into one as follows (note that there is no underscore between slevel and merge):

#include <tis_builtin.h>

int main() {
    int i = tis_interval_split(0, 1);
    int j = tis_interval_split(0, 1);
    int k = tis_interval_split(0, 1);
    //@ slevel merge;
    tis_show_each("i j k", i, j, k);
}

This reduces all the states at tis_show_each into a single state:

[value] Called tis_show_each({{ "i j k" }}, {0; 1}, {0; 1}, {0; 1})
Merging states in loops

This technique can be particularly useful when dealing with complex loops. Consider the following example:

#include <tis_builtin.h>

int main() {
    int num = 0;
    int denom = tis_interval(0, 9);
    int step = tis_nondet(-1 , 1);
    
    int i = 0;
    while (i < 10) {
        int direction = tis_interval(0, 1);
        denom += step;
        num = direction ? num + i : num - i;
        i++;
    }

    tis_show_each("denom step", denom, step);
    int result = num / denom;
    return 0;
}

This code snippet calculates the value of integer division between the variables num and denom, which are calculated over 10 iterations of a loop. When denom enters the loop, it has some integer value between 0 and 9 and is increased in each iteration by the value of the variable step. The value of step is indeterminate to the analyzer, but is either -1 or 1 and is constant throughout the execution of the entire loop (as if it were a parameter passed in from outside the function). In the case of num, it starts as 0 and is either increased or decreased by the value of the iterator i, depending on direction. This direction is represented by a tis_interval from 0 to 1, signifying that the direction is determined in such a way that the analyzer cannot definitely predict it. So, in each step the range of values that num can take grows in both directions. The direction is decided separately for each iteration, as if it were a result of a function call executed over and over inside the loop.

The user should be able to quickly determine that denom can never be 0, and so, computing result should not trigger a division by zero. Specifically, denom is expected to be either a value between -10 and -1 if step is negative or a value between 10 and 19 if step is positive.

However, the analyzer will nevertheless report potential undefined behavior when run. (The command includes -val-slevel-merge-after-loop="-main" to prevent it from merging all the states at each iteration.)

$ tis-analyzer -val -slevel 200 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..19], {-1; 1})
tests/tis-user-guide/slevel_merge_loop_1.c:25:[kernel] warning: division by zero: assert denom ≢ 0;

Analyzing the output of the analyzer reveals that the reason behind the detected undefined behavior is that the abstracted value of denom spans from -10 to 19. This is the case because the indeterminacy inside the loop causes the analyzer to maintain a lot of distinct states, leading it to run out its slevel limit and to start merging states. The analyzer specifically merges states where step is both -1 and 1. Since denom can take more values in the merged state than the analyzer can represent by enumeration, it approximated the value of denom as the span [-10..19].

This false alarm can be removed by increasing the precision of the analyzer at that point. One way to do that is to increase the slevel limit:

$ tis-analyzer -val -slevel 520 slevel_merge_loop_1.c -val-slevel-merge-after-loop="-main"

This works, but since the number of propagated states is very large, the slevel limit must be set to at least the very large number of 520. Using slevel_merge can help keep the slevel limit significantly lower. The following modified snippet inserts an slevel_merge statement just before the loop, directing the analyzer to merge states at the beginning of each loop iteration (because it is inserted before the condition) so that step is kept a singular value in each resulting state.

#include <tis_builtin.h>

int main() {
    int num = 0;
    int denom = tis_interval(0, 9);
    int step = tis_nondet(-1 , 1);
    
    int i = 0;
    //@ slevel_merge step;
    while (i < 10) {
        int direction = tis_interval(0, 1);
        denom += step;
        num = direction ? num + i : num - i;
        i++;
    }
    
    tis_show_each("denom step", denom, step);
    int result = num / denom;
    return 0;
}

This additional guidance prevents the analyzer from merging the negative and positive possible value sets for denom while allowing it to merge states to account for the indeterminacy inside the loop. So, the analysis van be performed with a much lower slevel limit:

$ tis-analyzer -val -slevel 40 slevel_merge_loop_2.c -val-slevel-merge-after-loop="-main"
[value] Called tis_show_each({{ "denom step" }}, [-10..-1], {-1})
[value] Called tis_show_each({{ "denom step" }}, [10..19], {1})
Tuning value representation

Some other options can be used to control the precision of the representation of a value (rather than the number of states). For instance:

  • -val-ilevel n: Sets the precision level for integer representation to n: each integer value is represented by a set of enumerated values up to n elements and above this number intervals (with congruence information) are used.
  • -plevel n: Sets the precision level for array accesses to n: array accesses are precise as long as the interval for the index contains less than n values. See Tuning the precision for array accesses for more information about this option.
Disabling alarms

There are also options which allow to enable / disable certain alarms. Most of these options are enabled by default, and it is usually safer to leave it this way (unless you really know what you are doing).

Of course not all existing options have been enumerated here. The full list of the available options is given by -value-help.

Tuning the precision for array accesses

As explained in Tune the Precision, temporarily reducing the size of the arrays may be a first step during the interactive phase to make the analysis time shorter. But when analyzing large arrays, the -plevel option can be used to increase the precision level for array accesses. This option sets how hard the analyzer tries to be precise for memory accesses that, considering the imprecision on the indexes involved, can be at any of many offsets of a memory block. The default is 200 and may not be sufficient for an access at unknown indices inside a large or nested array to produce a precise result. This is illustrated by the example below:

#include <tis_builtin.h>

char c[20];
struct s { int m[50]; void *p; };
struct s t[60];

void init(void) {
  for (int i = 0; i < 60; i++)
    {
      for (int j = 0; j < 50; j++)
        t[i].m[j] = (i + j) % 10;
      t[i].p = c + i % 20;
    }
}

int main(void) {
  init();
  int x = tis_interval(5, 45);
  int y = tis_interval(2, 56);
  t[y].m[x] = -1;

  x = tis_interval(5, 45);
  y = tis_interval(2, 56);
  int result = t[y].m[x];

  int *p = &t[y].m[x];
  int result2 = *p;
}

With the default value of -plevel (200):

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ {{ NULL + [-1..9] ; &c + [0..19] }}
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

With higher plevel:

$ tis-analyzer -val -slevel-function init:10000 nestedarrays.c -plevel 3000
  x ∈ [5..45]
  y ∈ [2..56]
  result ∈ [-1..9]
  p ∈ {{ &t + [428..11604],0%4 }}
  result2 ∈ {{ NULL + [-1..9] ; &c + [0..19] }}

Note that result2 is not precise even with the higher plevel. Handling the lvalue t[y].m[x] directly allows the analyzer to be optimal as long as the value of the plevel option allows it to, but forcing the analyzer to represent the address as the value of the variable p produces this set of offsets:

{{ &t + [428..11604],0%4 }}

This set of offsets contains pretty much the addresses of everything in t, including the p pointer members, so it appears when dereferencing p that the result can be an address.

Trading memory for analysis speed

As explained in Tune the Precision, there is a trade-off between the precision of the analysis and the time it takes. There is also another one between the memory used to store intermediate results and the time it takes to recompute them.

The environment variable TIS_KERNEL_MEMORY_FOOTPRINT can be used to set the size of caches used during the value analysis, speeding up some slow analyses in which the caches were getting thrashed. The default is 2 and each incrementation doubles the size of caches. Only use this variable if you are not already low on memory.

Another useful option which helps in reducing the computation time is -memexec-all. If this option is set, when analyzing a function, the tool tries to reuse results from the analysis of previous calls when possible.

Control the Analyzer
Enhance the Trace

It is not necessary to wait until the analysis is finished in order to examine the computed values. It is also possible to inspect the values of variables during an ongoing analysis by printing messages to the standard output or to a log file. This way one can keep an eye on what is going on.

First of all, the standard printf function can be used to output constant messages or messages involving only precise values. However, printing the computed values when they are imprecise is not possible using printf, and instead the tis_show_each function should be used. The name of this function can be extended with any string, so that is is easier to make the difference between different calls (as the full function name is printed each time it is called). For instance:

tis_show_each_iteration (i, t[i]);

The statement above will output messages like this (one for each analyzed call):

[value] Called tis_show_each_iteration({0; 1; 2; 3; 4}, [0..10])

Another useful function, which properly handles imprecise values, is tis_print_subexps. When given any number of expressions as arguments, it will print the values of all the sub-expressions of each provided expression. The first argument of this function must be always a string literal, which will be printed out in order to help distinguish between different calls. For instance:

tis_print_subexps ("simple sums", x + y, y + z, z + x);

Such a statement will output messages like this:

[value] Values of all the sub-expressions of simple sums (expr 1):
    int x + y ∈ {3}
    int x ∈ {1}
    int y ∈ {2}
[value] Values of all the sub-expressions of simple sums (expr 2):
    int y + z ∈ {5}
    int y ∈ {2}
    int z ∈ {3}
[value] Values of all the sub-expressions of simple sums (expr 3):
    int z + x ∈ {4}
    int x ∈ {1}
    int z ∈ {3}

Moreover, tis_print_abstract_each allows the contents of structured variables to be observed. For instance:

tis_print_abstract_each(&i, &t);

This statement will output messages like this (one for each analyzed call):

[value] Called tis_print_abstract_each:
     i ∈ {0; 1; 2; 3; 4}
     t[0..4] ∈ [0..10]
      [5..99] ∈ UNINITIALIZED

Note that, tis_print_abstract_each in general takes addresses of variables as parameters. This applies as well for an array, such as t in the example above. Contrary to popular belief, when t is an array, &t is not the same as t, and the user should call tis_print_abstract_each(&t); to see the whole array (pointer decay only shows the first element).

To get even more information, the tis_dump_each function can be used to print the whole state at the program point where it is called. But it may be easier to call the tis_dump_each_file function to print the state in a file. The name of the file is computed from the first argument of the call (which must be a string literal), an incremented number, and an optional directory given by the -val-dump-directory option. The -val-dump-destination option allows to choose which kind of output is expected among txt or json (all for both, none for no output). For instance, when calling tis_dump_each_file ("state", *(p+i)) in a test.c file, and analyzing it with the command:

$ tis-analyzer -val test.c -val-dump-destination all -val-dump-directory /tmp

these messages are shown in the trace:

test.c:11:[value] Dumping state in file '/tmp/state_0.txt'
test.c:11:[value] Dumping state in file '/tmp/state_0.json'

The two generated files hold both the whole state computed the first time the program point is reached and the possible values for *(p+i).

For instance, the JSON file may look like:

{
  "file": "test.c",
  "line": 11,
  "args": "([0..10])",
  "state": [
    {
      "base": "t",
      "values": [
        { "offset": "[0..4]", "value": "[0..10]" },
        { "offset": "[5..9]", "value": "UNINITIALIZED" }
      ]
    },
    { "base": "i", "values": [ { "value": "{0; 1; 2; 3; 4}" } ] }
  ]
}

To better understand the results, see the Value Analysis Data Representation section.

Stop the Analyzer when Alarms Occur

In order to avoid wasting time analyzing the application in a wrong context, the analysis can be stopped as soon as some alarms are generated thanks to the -val-stop-at-nth-alarm option. With the argument equal to 1, it aborts the analysis at the first alarm. To ignore a certain number of alarms, the argument can be increased. Although there is no strict relation between the given argument and the number of alarms generated before the analysis stops (i.e. these two values are not necessarily equal), one thing is guaranteed: providing a larger number will lead to skipping more alarms.

Stop the Analyzer When Taking Too Long

The analyzer can also be stopped by sending a USR1 signal to the process. The process identifier (PID) can be found in the trace (unless the -no-print-pid option has been used or the TIS_KERNEL_TEST_MODE environment variable has been set). If the PID is 12345 for instance, the signal can be sent using the kill command:

$ kill -USR1 12345

The analyzer can also be stopped through the GUI (see Disconnect/Kill server).

When the analyzer receives this signal, it stops the Value analysis, but still continues with the other tasks. So for instance, it still save the current state if the -save option has been used. The saved state can then be loaded in the GUI to examine the results obtained so far. Notice that even if there is no more task to do, it can still take some time to properly stop the analysis.

The –timeout option can also be used to get a similar behavior after a given amount of time. For instance, the following command stops the analysis after 5 minutes and saves the results obtained so far in project.state:

$ tis-analyzer --timeout 5m -val ... -save project.state
Watchpoints

Another way to avoid wasting time by analyzing the application in a wrong context is to use watchpoints. Watchpoints make it possible to automatically stop the analysis when some specific memory conditions occur. There are currently five kinds of conditions available for this purpose:

  • tis_watch_cardinal: stop the analysis when the number of different values that a given memory location may possibly contain (because of imprecision) is greater than a certain maximal amount.
  • tis_watch_value: stop the analysis when a given memory location may possibly contain a value of the provided set of forbidden values.
  • tis_watch_address: stop the analysis when a given memory location may possibly contain an address.
  • tis_watch_garbled: stop the analysis when a given memory location may possibly contain a garbled mix value.
  • tis_detect_imprecise_pointer: stop the analysis when any expression is evaluated to an imprecise pointer which contains a given base address.

These functions are available using #include <tis_builtin.h>. The arguments of the four tis_watch_* functions follow the same logic:

  • The two first arguments define the memory location to watch: its address p and its size s.
  • The last argument n is the number of statements during which the condition may remain true before the analysis is stopped:
    • if n == -1, the analysis never stops, but messages are printed each time the condition is reached;
    • if n == 0, the analysis stops as soon as the condition is reached;
    • if n > 0, the analysis continues for the n-th first occurrences where the condition is reached (and prints messages for each of them) and stops at the next occurrence.
  • The meaning of the (potential) arguments in between varies depending on the particular function.

The function tis_detect_imprecise_pointer only takes a pointer as argument.

Each time a call to one of these functions is analyzed, a new watchpoint is set up (if it was not already present). These watchpoints remain active until the end of the analysis. Here is a typical example of using these functions:

 int x = 0; /* the memory location to watch */
 void *p = (void *)&x; /* its address */
 size_t s = sizeof(x); /* its size */
 int y[10];

 /* The analysis stops when x is not exact (i.e. not a singleton value). */
 int maximal_cardinal_allowed = 1;
 tis_watch_cardinal(p, s, maximal_cardinal_allowed, 0);

 /* The analysis stops the fourth time when x may be negative. */
 int forbidden_values = tis_interval(INT_MIN, -1);
 int exceptions = 3;
 tis_watch_value(p, s, forbidden_values, exceptions);

 /* The analysis stops when x may be an address. */
 tis_watch_address(p, s, 0);

/* The analysis stops when x may be a garbled mix value. */
 tis_watch_garbled(p, s, 0);

 p = y;

 /* The analysis starts to detect if an expression is evaluated to an
 imprecise pointer starting at base address &y. */
tis_detect_imprecise_pointer(p);

/* The analysis stops because the expression p+tis_interval_split(0,3)
is evaluated to an imprecise pointer &y + [0..3]. */
*(p+tis_interval(0,3)) = 3;
Remove Alarms by Adding Annotations

Tuning the precision using various analysis options, as previously explained, is one way of removing false alarms. Another way of guiding the analyzer is by adding assertions to the program. Other kinds of properties also can be introduced, but assertions are by far the most frequently used for this purpose.

If you do not know how to add ACSL properties to your project, first read ACSL Properties.

Of course, as the analysis results rely on the properties introduced in this way, they must be properly checked. The best approach is to verify such properties formally using, for instance, the WP plug-in (see Prove Annotations with WP) or other formal tools. When it is not possible, they should be verified manually.

Warning

Annotations that cannot be formally proven have to be carefully reviewed and justified in order to convince any reader that they are indeed true, since all the analysis results rely on them.

Some examples are given below.

If source code modifications are needed, and the source code is in a git repository, the tis-modifications tool may be helpful to track them in order to check that they are correctly guarded. See tis-modifications Manual.

Assertion to Provide Missing Information

As mentioned before, the internal representation of values in the Value plug-in is based on intervals. Unfortunately some relevant information concerning variables just cannot be represented in this form and thus cannot be taken into account by the analyzer when it would make a difference. However, thanks to introducing well placed assertions it is possible to compensate for this disadvantage by explicitly providing the missing information.

Example:

     int T[10];
     ...
L1:  if (0 <= x && x < y) {
       ...
L2:    if (y == 10) {
L3:      ... T[x] ...
         }
       ...
       }

When the T[x] expression is encountered at the label L3, the analyzer tries to check if the T array is accessed correctly (i.e. inside the array bounds), namely, is the (0 <= x < 10) condition true. It already knows that the (0 <= x) part holds, due to the conditional statement at label L1 (assuming that x has not been modified since then). Whatever values x might have had before, at L1 they have been restrained to only non-negative ones and this fact has been stored in the internal state (the interval of values for x was modified) and thus is visible at L3. For example, if before L1 the value of x was [--..--] (i.e. nothing is known about x except that it is initialized), then after L1 it would be [0..--] (i.e. the interval spanning from zero to positive infinity).

Now, the analyzer still needs to verify if (x < 10) also holds. This is obvious for anybody who reads the code: the condition at L1 assures that (x < y) and the condition at L2 assures that (y == 10), therefore (x < 10) must be true. Unfortunately, because of the limitations of the adopted value representation method, the analyzer cannot deduce this by itself. The fact that (x < y) holds just cannot be expressed in the internal state (nor actually any abstract relation between variables). And, supposing that the value of y at L1 is [--..--], the (x < y) condition does not help to restrain the values of x, its upper bound remains thus as before. Hence, this important piece of information is lost and the analyzer simply cannot connect it with (y == 10) at L2 in order to correctly restrain the value of x to [0..9]. So at L3 it will consider that the value of x is [0..--] and it will emit an alarm about a potential out of bound access to the array T.

To help the analyzer, the appropriate assertion can be added explicitly:

at L3: assert ax: x < 10;

Then the alarm will disappear. Of course the ax assertion still needs to be verified by other means. For example, this particular assertion can be easily proven using WP (see Prove Annotations with WP).

State Splitting

State splitting is another assertion-based technique that permits to guide the value analysis. It can be used to obtain the same results in the above example by splitting the internal state at L1 into two states by introducing the following assertion:

at L1: assert ay: y <= 10 || 10 < y;

As explained before, the analyzer can store multiple memory states at each program point (as explained in About the Value Analysis) and the maximal number of states that can be stored per program point in the internal representation is related to the precision level (i.e. slevel). So, provided that the -slevel option has set the precision level high enough in order to permit splitting the state here, the assertion above will lead to a case analysis:

  • The y <= 10 case: As y <= 10 is assumed, together with the x < y condition at L1, it leads to deducing x < 10, and this time this can be represented in the internal state.
  • The 10 < y case: If 10 < y is assumed, then the condition at L2 is false, and therefore the execution branch where the array T is accessed at L3 is not reached at all.

Thanks to introducing this assertion, the alarm will thus disappear. Moreover, the analyzer is able to check on its own that ay is always true, so there is nothing more to verify.

It is worth pointing out that whenever the value analysis encounters an ACSL property, it tries to check its validity.

Tip

Some user annotations can be formally checked using just the value analysis, without the need of employing any other form of verification method (e.g. the WP plug-in).

Store Relational Information

In some cases, the most efficient way to guide the value analysis is to directly add an intermediate variable to the program in order to make it easier to analyze. This method should be usually avoided if possible, since it is intrusive to the application’s code. Thus it should be used only when other solutions are not good enough or when you do not mind modifying the code.

For example, if the program contains a test (0 <= i+j < 10) and then several uses of T[i+j] follow, it may be convenient to add a temporary variable representing the i+j sum:

int tmp = i + j;
if (tmp < 10) {
  ..
  //@ assert a_tmp: tmp == i + j;
  ... T[tmp] ...
  }

If neither i nor j are modified in the meantime, the assertion that validates the code substitution should be trivial to verify, and the value analysis is now able to know that (tmp < 10).

Loop Invariant

Beside assert, requires and ensures, the loop invariant properties are also useful to enhance the analysis precision.

For instance, this function generates an alarm:

int T[100];

//@ requires 0 <= n < 50;
void main(int n)
{
    int i = 0;
    while (i <= n) {
        T[i] = 3;
        i++;
    }
    T[i] = i;
}
warning: accessing out of bounds index [1..127]. assert i < 100;

This is because the value of i is too imprecise when leaving the loop, so the analysis doesn’t know if the access to T[i] in the last assignment is valid or not.

Adding this loop invariant remove the alarm:

/*@ loop invariant i <= 50; */

Moreover, the value analysis is able to check that this property is always valid.

Bounded Buffers

A very common situation is to have a pointer to an array, and an integer that gives the number of remaining bytes between this pointer and the end of the array. In the internal representation of the values, it is not possible to represent relations between these two variables.

buffer problem

Buffer problem: there is a relation between cur and len, but it cannot be represented.

A typical function to handle this buffer is:

void process (char * cur, size_t len) {
  char * p = cur;
  for (size_t i = 0 ; i < len ; i++, p++) {
    *p = ...
  }
}

The validity of the pointer p has to be checked to avoid an alarm on p access, but also to get precise results later on. It is especially important when the pointer points to an array that is part of a larger structure. For instance:

struct data {
  char buffer[BUFFER_LEN];
  unsigned current_index;
  unsigned state;
};

//@ requires 0 <= data->current_index < BUFFER_LEN;
int treatment (struct data * data, int n) {
  char * current = data->buffer + data->current_index;
  size_t length = BUFFER_LEN - data->current_index;
  if (n > length) n = length;
  process (current, n);
  ...
  }

If the analysis is not able to know that p does not go beyond the end of the buffer field, the value of the other fields current_index and state might be modified as well and might be too imprecise for the analysis to give interesting results later on.

So the process function needs a precondition to give the constraint on cur and len to ensure the validity of the pointer. This precondition could simply be:

//@ requires \valid (cur + 0 .. len-1);

Unfortunately, the Value analysis is not able to reduce the input states with this kind of annotation, but it can be translated into a more exploitable equation when one of the data is precise enough to reduce the other:

  • if the data to reduce is the pointer:
//@ requires cur <= \base_addr (cur) + \block_length (cur) - len * sizeof (*cur);
  • if the data to reduce is the length:
//@ requires length <= (\block_length (data) - \offset (data)) / sizeof (*data);

Notice that the ACSL functions \base_addr, \block_length and \offset only provide the expected information when cur is a pointer to an array allocated on its own. If the array is a field in a structure, \base_addr(cur) returns the base address of the structure.

base of a pointer in a struture

Structure with a .buf field: ACSL functions are related to the allocated block, not the internal array.

Anyway in some cases, even if the analyzer computes the optimal information, cur and len both have an unknown value from intervals and the relation between the two variables has been lost. So the memory access to (*p) raises an alarm when we cannot check that adding the upper bound of both intervals is smaller than (buf + BUFFER_LEN). Moreover, if buf is in a structure as explained above, buf and BUFFER_LEN may be unknown in the function.

A trick can be to modify the original function by adding a parameter that gives a pointer in the object beyond which the function is not expected to access:

/*@ requires f_r_buf: val: cur <= bound;
    requires f_r_len: wp: len <= bound - cur;
*/
void process_bounded (char * cur, size_t len, char * bound) {
  char * p = cur;
  //@ loop invariant f_l_p: val: p <= bound;
  for (size_t i = 0 ; i < len ; i++, p++) {
    if (p >= bound) break;
    *p = ...
  }
}

In the previous example, the call to process would have to be changed to:

process_bounded (current, n, data->buffer + BUFFER_LEN);

As long as the preconditions are true, this modified function is equivalent to the original one. This first precondition is often checked by the Value analysis, when it is not, the value analysis reduces the range of cur. The value analysis can use the second precondition to reduce the length.

Two annotated functions with such bounds, tis_memset_bounded and tis_memcpy_bounded, are provided to be used instead of memset and memcpy when this problem occurs with these libc functions.

ACSL Properties

This chapter explains how to introduce ACSL properties to a project.

ACSL is the specification language employed in TrustInSoft Analyzer. ACSL properties can be used to specify functions (as seen in Write a Specification) and to guide the analysis by adding local annotations, such as assertions or loop invariants, which may help in removing false alarms (as seen in Remove Alarms by Adding Annotations).

There are two ways to insert ACSL properties in a project:

  • through inlining: putting ACSL properties directly in the analyzed application’s source code,
  • through importing: writing ACSL properties into external files and then loading these files from the command line when running the analysis.
Inline ACSL Properties

One way to add ACSL annotations to a project is to write them directly in the source code in special comments:

  • either standard C style comments: /*@ ... */,
  • or one line C++ style comments: //@ ....

There are several kinds of properties and they all need to be placed in an appropriate place in the source code:

  • Function specifications (described in Write a Specification) must be inserted either before the function itself or before the function’s declaration.
  • Local properties (described in Remove Alarms by Adding Annotations):
    • assertions must be inserted at the program point where they apply,
    • loop properties must be inserted just before the loop they apply to.

For more information about the ACSL language, please refer to the ACSL Documentation.

Caution

The ACSLimporter plug-in is only available in the commercial version of TrustInSoft Analyzer.

Import ACSL Properties

For many reasons it is usually preferable to avoid modifying the source code which is analyzed, as introducing changes to the application’s code can lead to difficulties in comparing it with the original version. For example, adding new properties alters the line numbering in a file, which makes it impossible to report problems with the original source line number.

The ACSLimporter plug-in makes it possible to write the ACSL properties into separate files and then import them for the analysis. The syntax of such files looks like this:

function <function-name>:
  contract:
    requires <pre-name-1>: <pre-definition-1>;
    assigns <assigns-definition-1>;
    ensures <post-name-1>: <post-definition-1>;

  at L1: assert <assert-name-1a>: <assert-definition-1a>;
  at L1: assert <assert-name-1b>: <assert-definition-1b>;
  at L2: assert <assert-name-2>: <assert-definition-2>;

  at loop 1:
    loop invariant <inv-name-1a>: <inv-definition-1a>;
    loop invariant <inv-name-1b>: <inv-definition-1b>;

Of course, the <...> parts should be substituted by specific names and definitions.

Depending on the organization of the project, it might be better to put all the properties in a single ACSL file or to split them throughout several files. If the properties concerning the same function appear in different files, the specifications are merged.

To load the property files, so that they are taken into account during the analysis, the -acsl-import <file.acsl> option has to be specified for each concerned ACSL file.

Naming the Annotations

Giving a unique name to each annotation permits referring to it easily later on. Moreover, it makes the result files a lot clearer and more readable: when mentioning a particular annotation they will use its name instead of the corresponding file name and line number.

Using standard naming conventions is highly recommended. Some tools require particular naming of assertions to properly check that everything have been verified at the end of the analysis.

The proposed naming conventions are:

  • To choose a unique short prefix for annotations concerning each function (example: add_).
  • To include a letter that indicates the kind of each property (e.g. first requires property for function add could be then named add_r1). (However, this not really necessary if the name is always used together with the corresponding keyword, like for example: requires add_r1, ensures add_e2, etc.)
  • To use a suffix informing about the way each property is verified. For instance:
    • add_e2_val: if the property is found always valid by Value;
    • add_e2_wp: if the property is proved by WP;
    • add_e2_sc: if the property could have been removed as redundant by Scope (note: it could be necessary to keep this property anyway because it still might be useful for Value or WP computations);
    • add_e2_rv: if the property has been manually reviewed.

These naming conventions might seem to be quite cumbersome to use (especially the verification method suffix). However, as mentioned before, they make the automatic generation/verification possible, so they are highly recommended.

Caution

The WP plug-in is only available in the commercial version of TrustInSoft Analyzer.

Prove Annotations with WP

WP refers both to a method to formally verify properties of the analyzed code and the name of the analyzer’s plug-in that implements this method. WP is a static analysis technique, like the value analysis, but involving theorem proving. For a short introduction describing how it works see Short Introduction to WP Computation.

The purpose of this chapter is mainly to explain in which cases WP can be used with a minimal amount of manual work required. This does not mean that it cannot be used in more complex situations, but then it requires more knowledge about the WP computation and/or competences in performing manual proofs using proof assistants such as Coq.

How to Run WP

The easiest way to run WP is to do it in the GUI by selecting the annotation to prove, right-clicking to open the pop-up menu, and choosing the Prove Property by WP option.

However, when the analysis evolves, it is usually more practical to run WP from the command line, save the project, and extract the status of the properties from the information file to just check that the results are still the same.

The command line to run WP and save the project looks like:

tis-analyzer -load project.state -wp-fct f1 -wp-prop f1_p1_wp, f1_p2_wp \
                           -then -wp-fct g,h -wp-prop f_pre_wp \
                           ...
                           -then -save project.wp.state

This command line:

  • opens a previously saved project project.state: it doesn’t need to include value analysis results, and even doesn’t have to include an entry point. All it needs is the body of the functions where the properties to verify are, and probably some specifications for the functions called from these functions.

  • tries to verify the properties named f1_p1_wp and f1_p2_wp in the f1 function,

  • then tries to verify the property f_pre in g and h.

    Notice that f_pre is supposed to be a precondition of f, and that it is checked in g and h which are supposed to be some of f callers;

  • saves the results in the project.wp.state project.

Notice that a _wp suffix is used in the names of the properties that are checked with WP. See Naming the Annotations to understand why naming conventions are useful.

This is an example of how to use WP, but the plug-in provides many other options if needed. Please use the -wp-help option to list them, and refer to the documentation for more details.

To know how to extract the status of the properties from the project.wp.state project, see Information about the Properties.

Short Introduction to WP Computation

Let us give very simple explanations about WP for the one that knows nothing about it, because it might be necessary to understand how it works in order to use it when suitable.

To verify that a property is true at a program point, the WP principle is to propagate it backward and to compute a formula that is such that, if it can be proved to be true, then it ensures that the initial property is true as well. The computed formula is then sent to some automatic prover(s). For instance, tis-analyzer comes with alt-ergo, but more provers can be added.

An example is easier to understand:

const int X = 10;

void f1(int x)
{
    int y = x + 1;
    int z = 2 * y;
L:  //@ assert y_val: y > X;
    ...
}

To ensure that y_val is true at L, WP computes that one have to prove that (x+1 > X) when entering the function. Notice that the z assignment has no effect since WP knows that it doesn’t modify y value. This can be automatically proved if a precondition gives:

//@ requires r_x_ge_X: x >= X;

This is because the final computed formula is:

x >= X ==> x+1 > X;

which is easily proved by any automatic prover.

It doesn’t work with the precondition:

//@ requires r_x_ge_15: x >= 15;

This is because WP only works on the function source code, which means that it has no information about the value of X. To solve this kind of problem, one can add:

//@ requires r_X_val: X == 10;

This precondition is easily validated by the value analysis and can be used by WP to finished the proof with r_x_ge_15.

In this simple case, the initial property and the computed formula are equivalent, but it is not always the case. WP just ensures that if the computed formula is true, then the property is true each time its program point is reached.

WP on Loops

To prove a loop invariant property, the WP computation is very similar, but decomposed into two goals:

  • establishment: the property must be true when reaching the loop. This is similar to an assert before the loop.
  • preservation: if the property is true at the beginning of an iteration, it is still true at the end.

Example:

//@ requires n < 100;
int main(int n)
{
    int i; int * p;
    for (i = 0, p = T; i < n; i++, p++) {
        *p = 3;
    }
    ...
}

The following property remove the alarm about the validity of the (*p) assignment in the loop:

//@ loop invariant li_1: p == T+i;

Moreover it can be proved by WP:

  • the establishment has to be proved before entering the loop, but after the initialization part. So the proof obligation is:

    T == T + 0
    
  • the preservation formula is similar to:

    p == T + i  ==>  p + 1 == T + (i + 1)
    

Both formula are trivially true.

WP for Indirect Memory Access

In the first example in Short Introduction to WP Computation, the z assignment has no effect on the WP computation since WP knows that it doesn’t modify y value. But it is different when pointers are involved. For instance:

void f(int * px, int x, int * py, int y)
{
    *px = x;
    *py = y;
    //@ assert a_px: *px == x;
    //@ assert a_py: *py == y;
    ...
}

WP is able to prove a_py, but not to prove a_px. The reason is that it doesn’t know whether the assignment to (*py) modifies (*px) value or not. The a_px can be proved only with the precondition:

//@ requires \separated (px, py);

It tells that there is no intersection between (*px) and (*py) locations in the memory.

In the context of adding annotations to remove alarms, except in very simple cases, it is not recommended to use WP when possibly overlapping pointers are involved since it may take some time to provide enough information.

WP for a Function Call

The other problem is when there are some function calls between the property and the statements that makes it true. Remember that WP only work on the source code of the property function, and on the specifications of the called functions.

extern int X;
void g (void);
void f(int x, int y)
{
    if (x > y && x > X) {
        g ();
        //@ assert ax1: x > y;
        //@ assert ax2: x > X;
        ...
    }
    ...
}

WP is able to prove ax1 since there is no way for g to modify either x or y, but ax2 cannot be proved since g may modify X.

There are two solutions to solve the problem:

  • add an assigns property for g to specify the modified data.

    For instance, ax2 is proved when adding:

    //@ assigns \nothing;
    

    This is not the preferred method since assigns are difficult to prove: it requires to know the modified data for each statement of g. The computed dependencies may help to justify the assigns property, but beware that this information is context dependent.

  • add a postcondition about the involved data. For instance:

    • specifying that X is not modified by g:

      //@ ensures X == \old (X);
      
    • or specifying that X decrease:

      //@ ensures X < \old (X);
      

Both solutions enable to prove ax2.

WP is Useful Even for Trivial Properties

The WP could seem useless if not used in complex cases, but it is not true: even when properties look trivial, it is useful to formally prove them, since it is so easy to make a mistake.

Let us look at an example:

//@ ensures e_res_ok: min <= \result <= max;
int bounds(int min, int x, int max)
{
    int res = x;
    if (x < min) res = min;
    if (x > max) res = max;
    return res;
}

The postcondition seems reasonably easy to justify, but WP is unable to prove it. WP computes a proof obligation equivalent to:

if (x > max)      then min <= max /\ max <= max
else if (x < min) then min <= min /\ min <= max
     else              min <= x   /\   x <= max

After simplifying the formula, it appears that the information (min <= max) is missing, so this postcondition cannot be proved without a precondition. It then has to be added and checked in every context where the function is called to ensure that the post-condition is verified.

WP Conclusion

The advice here is to use WP only in simple cases because complex cases needs expertise and require a lot of time. But we have seen that even for properties that look trivial, it is better to formally prove them, since it is so easy to make a mistake. Moreover, manual justification of trivial properties may look a little silly.

One must be especially careful when it seems that WP should be able to prove something, and doesn’t, since it may hide a problem somewhere. It is always better to understand if it is really a WP weakness, or something else.

Conclusion

Now that you should know how to analyze an application, it is important to insist on how it is important to put things together and check all the hypotheses.

If there is only one analysis, it is quite easy to check. The results rely on:

  • the context defined by the entry point,
  • the assigns and ensures properties of the external functions because they cannot be checked,
  • all the other annotations that are not either valid according the value analysis or proved by WP. If there are several analyses, the results of each of one rely on the same hypotheses than above, but there are more things to check:
  • the WP proofs doesn’t depend on the context, so they can be done only once, IF the functions are the same across the analyses (not modified through a macro for instance),
  • to be valid according to the value analysis, a property must have this status in ALL the analyses,
  • when splitting, remember that the context of the function analysis must represent all its precondition (see figure in Pre-conditions of a Defined Function).

To be fully verified in the given context all the hypotheses above must have a clear justification for lack of formal verification.

Analyzing C++ Programs

TrustInSoft Analyzer++ lets you analyze C++ programs. This document describes:

In addition, there is also a separate getting started tutorial on analyzing C++ code in the Analyzing C++ code section of the manual.

TrustInSoft Analyzer++ Specificities
Identifiers
Mangling

The identifiers in a C++ program are mangled to match C identifiers. The mangling scheme used in TrustInSoft Analyzer is a variation of Itanium mangling. The differences are:

  • Class, union and enum names are also mangled, even if this is not required by Itanium. The grammar entry used for these types is _Z<name>. As such, the class:

    struct Foo {
      int x;
    }
    

    is translated as:

    struct _Z3Foo { int x; }
    
  • Local variables and formal parameter names are also mangled, to avoid shadowing extern "C" declarations. The grammar entry used for a local variable is _ZL<unqualified-name>. As such, the local variable bar in:

    int main() {
      int bar x = 2;
    }
    

    is mangled as _ZL3bar. The keyword this is not mangled.

  • The virtual method table and the typeinfo structure for a class Foo are mangled as extra static fields named __tis_class_vmt and __tis_typeinfo in this class. As such, the class:

    struct Foo {
      virtual void f() {}
    };
    

    leads to the generation of two variables with mangled names _ZN3Foo15__tis_class_vmtE and _ZN3Foo14__tis_typeinfoE.

Demangling

To make reading the identifiers easier, TrustInSoft Analyzer displays by default a demangled version of the identifier. In the GUI, the mangled name can be obtained by right-clicking on an identifier and select Copy mangled name.

Signatures are ignored when demangling function names. As such, the assignment in:

void func(int) {}

void
test()
{
    void (*ptr)(int) = &func;
}

is displayed as:

void (*ptr)(int);
ptr = & func;

even if the mangled name of func is _Z4funci. This can lead to ambiguity when there are multiple overloads for the named function. A solution to solve it is to look at its mangled name.

Constructors and destructors are demangled as Ctor and Dtor. If the constructor or destructor is a constructor for a base class and is different from the constructor for the most derived object, the suffix Base is added. If the constructor is a copy constructor, the suffix C is added. If the constructor is a move constructor, the suffix M is added. Therefore, the demangled name Foo::CtorC stands for the copy constructor of the class Foo. If the destructor is virtual, it will be demangled as DeletingDtor.

The option -cxx-filt can be used to print the demangled version of an identifier, as demangled by the analyzer. If the identifier is a function name its signature will also be printed. For example, the command tis-analyzer++ -cxx-filt _Z3fooii displays {foo(int, int)}.

When displayed, function return types are preceded by a -> symbol and are displayed after the formal parameter types. For example, the instance of the function show in the following code:

struct Foo {
    void f(int) {}
};

template <typename T>
void show(const T&) {}

int
main()
{
    show(&Foo::f);
}

is printed as show<{(int) -> void} Foo::*>.

Template parameter packs are printed enclosed by [ and ]. As such, the command tis-analyzer++ -cxx-filt _Z1fIJ3Foo3FooEEvDpRKT_ displays {f<[Foo, Foo]>(const [Foo, Foo]&) -> void}: f is a function templated by a parameter pack, which is instantiated with Foo, Foo. Note also that in this case the const and & are applied to the whole pack.

Names displayed in the GUI can be prefixed by .... These names are shortened versions of qualified names. Clicking on this prefix will display the full mangled or demangled name, depending on the command line options.

Functions and methods
Argument passing

When calling a function, TrustInSoft Analyzer uses different transformations to initialize the function’s arguments depending on the type of the argument. These transformations match Itanium calling convention.

Scalar types

Scalar types are kept as is.

Reference types

Reference types are translated as pointers to the referenced types. The initialization of an argument of reference type is translated as taking the address of the initializer. If this initialization requires the materialization of a temporary object, this step is done by the caller. For example, with the following original source code:

void f(int &&, int &);

void g() {
    int x;
    f(2, x);
}

the translated declaration for the function f is void f(int *a, int *b) and the call to f is translated as:

int x;
int __tis_temporary_0;
__tis_temporary_0 = 2;
f(& __tis_temporary_0,& x);
Class types

The passing of a class type depends on whether the class is non-trivial for the purposes of calls. A class type is non-trivial for the purpose of call if:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

If the type is non-trivial for the purposes of calls, a variable of the class type is defined in the caller and the function receives a pointer to this variable. Such variables are named __tis_arg_##. For example, in the following code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

void f(Obj x, Obj y);

void g() {
    f( {}, {} );
}

the translated function f has the signature:

void f(struct Obj *x, struct Obj *y);

and its call is translated as:

struct Obj __tis_arg;
struct Obj __tis_arg_0;
{
  Obj::Ctor(& __tis_arg_0);
  Obj::Ctor(& __tis_arg);
}
f(& __tis_arg,& __tis_arg_0);

If the function returns a class that is non-trivial for the purposes of calls, then it is translated as a function returning void but with an additional argument. This argument is a pointer to a variable in the caller that will receive the function return. If the caller does not use the function return to initialize a variable, a variable named __tis_cxx_returnarg_## is created for this purpose.

For example, with the following original source code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

Obj f();

void g() {
    Obj o = f();
    f();
}

the translated function f has the signature:

void f(struct Obj *__tis_cxx_return)

and the body of the function g is translated as:

struct Obj o;
f(& o);
{
  struct Obj __tis_cxx_returnarg;
  f(& __tis_cxx_returnarg);
}
return;

If the type is trivial for the purposes of calls, no transformation is applied and the object is passed by copying its value. For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f(Obj o);

void g() {
    f( {} );
}

the signature of the translated function f is

struct Obj f(struct Obj o)
Unknown passing style

Sometimes, TrustInSoft Analyzer cannot decide if a class is trivial for the purposes of calls in a translation unit. In such cases, it will assume that the type is non-trivial for the purposes of calls and emit a warning like:

[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.

If the user knows that the type is trivial for the purpose of calls, he can use the option -cxx-pass-by-value to force this.

For example, with the following original source code:

struct Foo;

void f(Foo x);
  • with no particular option set, TrustInSoft Analyzer will produce the following warning and declaration for f:

    [cxx] warning: Unknown passing style for type 'Foo'; assuming
    non-trivial for the purpose of calls. Use the option
    '-cxx-pass-by-value _Z3Foo' to force the opposite.
    
    void f(struct Foo *x);
    
  • with the option -cxx-pass-by-value _Z3Foo, TrustInSoft Analyzer will produce the following declaration for f without warning:

    void f(struct Foo x);
    

Using an incorrect passing style can lead to errors like:

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct _Z3Foo * vs. struct Foo
                   First declaration was at file1.cpp:7
                   Current declaration is at file2.c:7

or

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct Foo vs. void
                   First declaration was at file.c:7
                   Current declaration is at file.cpp:7
Method transformations

Methods do not exist in C, and are translated as functions by TrustInSoft Analyzer++. The following additional transformations are applied to non-static methods:

  • the name of the function is the qualified name of the method.
  • if the method is a non-static method, the function gets an additional this argument. Its type is a pointer to the class enclosing the method. There is an additional const qualifier if the method is const-qualified.
  • if the method is a non-static method, the this argument is initialized with the address of the calling object.

For example, with the following original source code:

struct Obj {
    Obj();
    static void bar(int x);
    void foo(int x) const;
};

void
f(void)
{
    Obj o;
    o.foo(1);
    Obj::bar(0);
}

two function declarations are produced:

void Obj::bar(int x);
void Obj::foo(const struct Obj *this, int x);

and the calls to foo and bar are translated as:

Obj::foo(& o,1);
Obj::bar(0);
Constructor elision

By default, constructor elision is enabled and TrustInSoft Analyzer++ will omit some calls to copy or move constructors to temporary objects, as allowed by C++ standards from C++11 onwards.

Constructor elision can be disabled with the -no-cxx-elide-constructors option.

For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f();

void g() {
    Obj y = f();
}

when constructor elision is enabled, the call to f is translated as:

f(& y);

However, when constructor elision is disabled with the option -no-cxx-elide-constructors, it is translated as:

struct Obj __tis_temporary_0;
f(& __tis_temporary_0);
Obj::CtorM(& y,& __tis_temporary_0);

In this case, the result of the call to f is written to the temporary object __tis_temporary_0 and this temporary object is then moved to the initialized variable y.

Virtual method calls

Virtual method calls translation is separated in three steps:

  • get the information required to call the method from the virtual method table of the object. The information is put in a variable named __virtual_tmp_XXX, where XXX is the unqualified name of the method.
  • adjust the value of the this pointer calling the method, and call the resolved function pointer using the previous information.
  • adjust the value of the this to fetch the eventual virtual base (see the paragraph at the end of this section)
  • adjust the value returned by the call if the function might be a covariant override and the returned value is not nullptr.

As such, the function call_get in the following code:

struct Foo {
    virtual Bar *get() { return nullptr; }
};

Bar *call_get(Foo *f) {
    return f->get();
}

is translated as:

struct Bar *call_get(struct Foo *f)
{
  struct Bar *__retres;
  char *__virtual_return_get;
  struct __tis_vmt_entry const *__virtual_tmp_get;
  char *tmp_0;
  __virtual_tmp_get = f->__tis_pvmt + 1U;
  __virtual_return_get = (char *)(*((struct Bar *(*)(struct Foo *))__virtual_tmp_get->method_ptr))
  ((struct Foo *)((char *)f + __virtual_tmp_get->shift_this));
  if (__virtual_return_get) tmp_0 = __virtual_return_get + __virtual_tmp_get->shift_return;
  else tmp_0 = __virtual_return_get;
  __retres = (struct Bar *)tmp_0;
  return __retres;
}

The special case of covariance on virtual bases: if the called virtual function is covariant and if its return type has a virtual base of the return type of the overridden function, we need to fetch this virtual base at the call site.

To do so we need to get the offset to apply to the returned object pointer. This offset is in an array, and there is a pointer to this array at the offset 0 of the returned object. So we cast the returned object as pointer to an array of offsets, and access this array at the vbase_index to get the offset.

In this case the code is translated as such:

if (__virtual_tmp_f->vbase_index != (long)(-1)) // do we have a virtual base?
  __virtual_return_f += *(
    *((long **)__virtual_return_f) // get the array of offsets
    + __virtual_tmp_f->vbase_index); // get the appropriate offset
Controlling the virtual method calls translation

The option -no-cxx-inline-virtual-calls can be used to replace this transformation by a call to a generated function named XXX::__tis_virtual_YYY, where:

  • XXX is the static type of the class containing the method that was called.
  • YYY is the unqualified name of the method.

With this option, the function call_get of the example above is translated as:

struct Bar *call_get(struct Foo *f)
{
  struct Bar *tmp;
  tmp = Foo::__tis_virtual_get(f);
  return tmp;
}

The generated __tis_virtual_ functions keep the states obtained by the virtual call separated.

Objects memory layout

TrustInSoft Analyzer uses its own memory layout to represent C++ objects. In order to preserve as much useful information as possible, the analyzer defines multiple well-typed data structures, and uses more than one extra pointer field in polymorphic classes. As a result of this choice, the numeric value of sizeof(Class) will differ between the compiled code and the analyzed code.

Objects, being class or struct, are translated as C structures. union are translated as C unions.

The inline declaration of a static field is translated as a declaration of a global variable with the same qualified name. The out-of-line definition of a static field is translated as a definition of a global variable with the same qualified name.

Non-static fields are translated as fields in the translated structure. The fields are emitted in the source code order.

Empty classes

Empty classes are translated as a structure with one field char __tis_empty;. This enforces that the size of an empty class is not zero.

Non-virtual inheritance

Non-virtual non-empty base classes are translated as fields in the derived class. Such fields are named __parent__ followed by the name of the base class.

For example, with the following original source code:

class Foo {
    int x;
};

struct Bar: Foo {
    int y;
    int z;
};

the structures produced for the class Foo and Bar will be:

struct Foo {
   int x ;
};

struct Bar {
   struct Foo __parent__Foo ;
   int y ;
   int z ;
};

Non-virtual empty base classes do not appear in the translated C structure. For example, with the following original source code:

class Foo { };

struct Bar: Foo {
    int y;
    int z;
};

the structure produced for the class Bar is:

struct Bar {
   int y ;
   int z ;
};

In this case, a reference to the base Foo of an object of type Bar binds to the original object. In other words, the assertion in the following program is valid in the model used by TrustInSoft Analyzer:

class Foo {};

struct Bar: Foo {
    int y;
    int z;
};

int
main()
{
    Bar b;
    Foo &f = b;
    void *addr_b = static_cast<void *>(&b);
    void *addr_f = static_cast<void *>(&f);
    //@ assert addr_b == addr_f;
}
Polymorphic classes

If a C++ class is polymorphic, its corresponding C structure contains two additional fields:

  • struct __tis_typeinfo const *__tis_typeinfo; holding a pointer to the type_info of the most derived object of the current object.
  • struct __tis_vmt_entry const *__tis_pvmt; holding a pointer to the virtual method table of the current object.

As an example, the class:

struct Foo {
    int x;
    virtual void f() {}
};

is translated as:

struct Foo {
   struct __tis_typeinfo const *__tis_typeinfo ;
   struct __tis_vmt_entry const *__tis_pvmt ;
   int x ;
};

These additional fields are set by the constructors of the polymorphic class.

Virtual inheritance

If a class has a virtual base, its translation produces two different C structures: the regular C structure as well as a base version of the class.

The regular structure is used when the object is the most derived object. In this case:

  • the structure gets an additional field long const * __tis_vbases_ptr;. This is an array holding the offset of each virtual base of the object.
  • all virtual bases of the class are translated as fields in the C structures but their name is prefixed by __tis_vbases_ to distinguish them from non-virtual bases.

The base version of the object has its name prefixed by __vparent__ and is used when the object is used as a base for another object. In this case:

  • the structure gets an additional pointer __tis_vbases_ptr of type long const *. This is an array to the offset of each virtual base of the object.
  • the structure does not contain fields related to the virtual base classes.

As an example the following class:

struct Baz: Bar, virtual Foo {
    int z;
};

produces the two classes:

struct Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
   struct Foo __vparent__Foo ;
};

struct __tis_base_Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
};

Accessing a virtual base is always done by shifting the address of the current object with the offset of the virtual base in the __tis_vbases_ptr array.

As an example, with the following code:

struct Foo {
    int x;
};

struct Bar: virtual Foo {
    int y;
};

int
main()
{
    Bar bar;
    Foo &foo = bar;
}

the body of the main function is translated as:

int __retres;
struct Baz baz;
struct Foo *foo;
Baz::Ctor(& baz);
foo = (struct Foo *)((char *)(& baz) + *(baz.__tis_vbases_ptr + 0));
__retres = 0;
return __retres;

The virtual base Foo of a class Baz has index 0, so the offset to use to go from Baz to Foo is *(baz.__tis_vbases_ptr + 0)

Layout summary

The full layout for objects is the following, in increasing address order:

  • For most-derived classes and classes with no virtual base:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields
    • Virtual bases
  • For classes with virtual bases used as base classes:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields
Member pointers
Pointers to member functions declaration

Pointers to a method X Foo::f(A1, A2, ..., An) are translated as a C structure with the following fields:

unsigned long vmt_index ;
X (* __attribute__((__tis_sound_cast__)) ptr)(struct Foo *, A1, A2, ..., An) ;
long shift ;
size_t vmt_shift ;

If Foo::f is a non-virtual method, then:

  • the field ptr is a pointer to the method called when resolving the symbol f in the scope of Foo. This can be the method Foo::f if f is declared in Foo or a method of one of the parent classes of Foo.
  • the field shift is the offset of the base containing the method f. If f is in Foo, then this is 0, otherwise it is the offset of the parent class declaring f.
  • the field vmt_index is 0.

If Foo::f is a virtual method, then:

  • the field ptr is the same as if Foo::f was a non-virtual method.
  • the field shift is the same as if Foo::f was a non-virtual method.
  • the field vmt_index is 1 + the index of the method in the virtual method table of the class containing the final override of f in Foo. This can be different from the index of f in the virtual method table of Foo if the final override of f is declared in a parent class of Foo.

Each pointer to member function type produce a different structure type. The structure type is named __tis_XXXX, where XXXX is the mangled name of the method pointer type.

For example, with the classes:

struct Pack {
    char c[1000];
};

struct Bar {
    int y;
    int f() { return 2; }
};

struct Foo: Pack, Bar {
    virtual void g() {}
};

the following statements:

int (Foo::*x)(void) = &Foo::f;
void (Foo::*y)(void) = &Foo::g;

are translated as:

struct __tis_M3FooFivE x;
struct __tis_M3FooFvvE y;
x.vmt_index = 0UL;
x.ptr = (int (*)(struct Foo *))(& Bar::f);
x.shift = 0L - (long)((struct Foo *)((unsigned long)0 - (unsigned long)(& ((struct Foo *)0)->__parent__Bar)));
x.vmt_shift = 0UL;
y.vmt_index = 2UL;
y.ptr = (void (*)(struct Foo *))(& Foo::g);
y.shift = 0L;
y.vmt_shift = ((unsigned long)(& ((struct Foo *)0)->__tis_pvmt);
Program initialization
Dynamic initialization

If a variable v is initialized at dynamic initialization time, it is translated as:

  • a declaration for the variable v.
  • a function void __tis_init_v(). The content of this function is the translation of the initializer of v.

All __tis_init_XXX functions are called by a special function __tis_globinit. The __tis_globinit function is in turn called at the beginning of the main function.

As an example, the program:

int id(int x) { return x; }

int x = id(12);

int
main()
{
    return x;
}

is translated as:

int x;
void __tis_init_x(void)
{
  x = id(12);
  return;
}

 __attribute__((__tis_throw__)) int id(int x);
int id(int x)
{
  return x;
}

 __attribute__((__tis_throw__)) int main(void);
int main(void)
{
  __tis_globinit();
  return x;
}

void __tis_globinit(void)
{
  __tis_init_x();
  return;
}
Static initialization

A variable with a constant initializer is translated as a C variable with an initializer. The initializer is the value of the C++ constant initializer. As an example, with the following code:

constexpr
int
add_one(int x)
{
    return x + 1;
}

const int x = add_one(2);

the definition of the variable x is translated as:

static int const x = 3;

In some special circumstances, one may need to disable static initialization semantics described by the C++ standard. It can be done using the option -no-cxx-evaluate-constexpr. In this case, whenever a variable is initialized with a constant initializer that is not a constant initializer according to the C rules, the initialization of this variable is done at dynamic initialization time and uses the initializer as it was written by the user.

Using this option can lead to unsound results. As an example, with the following program:

constexpr int id(int x) { return x; }

extern const int x;

const int y = x;

const int x = id(1);

int
main()
{
    int a = y;
    int b = x;
    //@ assert a == b;
    return a == b;
}
  • the assertion is valid from the C++ standards point of view.
  • the assertion is valid using the command-line tis-analyzer++ --interpreter test.cpp.
  • the assertion is invalid using the command-line tis-analyzer++ --interpreter test.cpp -no-cxx-evaluate-constexpr.
Static local variables

A static local variable x is translated as a triple of:

  • a global variable, corresponding to the translated static local variable. The type of the variable is the translated type of the static variable, and the name of the variable its prefixed by its enclosing function name. This variable is 0-initialized.
  • a global variable __tis_guard_x.
  • a check on __tis_guard_x ensuring the initialization of the static variable is not recursive, followed by a conditional block doing the initialization of the variable once.

The variable __tis_guard_x can have the following values:

  • -1: the variable x has not been initialized yet.
  • 0: the variable x is being initialized.
  • 1: the variable x has been initialized.

As an example, the following function:

int
main()
{
    static Foo f;
    return 0;
}

is translated as:

int main::__tis_guard_f;
struct Foo main::f = {.x = 0};

int main(void)
{
  int __retres;
  tis_ub("Recursive initialization of the static local variable f.",
       main::__tis_guard_f != 1);
  if (! main::__tis_guard_f) {
    main::__tis_guard_f ++;
    Foo::Ctor(& main::f);
    main::__tis_guard_f ++;
  }
  __retres = 0;
  return __retres;
}
Special variable names

TrustInSoft Analyzer++ introduces several special variables while translating code. This section sumarizes the different name families used for these variables.

Global variables
  • __tis_ABI::exc_stack_depth: how many exceptions are currently raised.
  • __tis_ABI::exc_stack: all exceptions currently raised.
  • __tis_ABI::caught_stack_depth: how many exceptions are currently caught.
  • __tis_ABI::caught_stack: all exceptions currently caught.
  • __tis_unwinding: whether the program is currently unwinding its stack
  • XXX::__tis_class_vmt: virtual method table for an object of type XXX used as most derived object.
  • XXX::__tis_class_typeinfo: typeinfo for an object with a most derived object being of type XXX
  • XXX::__tis_class_inheritance: inheritance information for an object with most derived object being of type XXX
Intermediate variables
  • __Ctor_guard: guard used to check if the lifetime of an object has started.
  • __tis_alloc: materialization of a space reserved by an allocation function.
  • __tis_arg: materialization of a function argument.
  • __tis_assign: temporary variable used to hold the right hand side of an assignment if it has potential side effects.
  • __tis_bind: temporary variable used to initialize non-reference structured bindings of arrays.
  • __tis_cast: result of a dynamic_cast.
  • __tis_const: materialization of an function argument that is non-trivial for the purpose of calls.
  • __tis_compound_literal: materialization of a C++ temporary used to initialize a compound literal.
  • __tis_constant_expression: materialization of a C++ constant expression.
  • __tis_cxx_return_arg: materialization of the discarded result of a call that is non-trivial for the purpose of calls.
  • virtual_dtor_tmp: virtual method table cell used when calling a virtual destructor.
  • __tis_deleted_value: address of a deleted value.
  • __tis_dereference: temporary variable used to dereference a member pointer.
  • __tis_dyncast: operand of a dynamic_cast.
  • __tis_exn: materialization of an object of type std::bad_XXX being thrown.
  • __tis_gnu_ternary: shared computation in a GNU ternary expression.
  • __tis_guard: guard controlling the initialization of a static local variable.
  • __tis_implicit_value: materialization of a C++ temporary used to perform an implicit value initialization.
  • __tis_index: index used when destroying an array when its lifetime finishes.
  • __tis_initializer_list: materialization of a C++ temporary used to build an initializer list.
  • __tis_init_size: loop variable used to initialize arrays.
  • __tis_lambda_temp: materialization of a C++ temporary variable in a lambda.
  • __tis_lvalue: materialization of a C++ lvalue that is not an lvalue in C.
  • __tis_mmp: method pointer being called.
  • __tis_mmp_init: temporary variable used to initialize a method pointer.
  • __tis_object_cast: temporary variable used to compute object inheritance casts.
  • __tis_offset: index used to destroy array elements as a consequence of calling delete[].
  • __tis_placement: address of an object initialized by a placement new.
  • __tis_relop: intermediate result when translation relational operators.
  • __tis_temp: materialization of a C++ temporary variable.
  • __tis_thrown_tmp: materialization of a C++ temporary variable in a thrown statement.
  • __tis_typeid: address of an object with a polymorphic typeid being computed.
  • __virtual_return: the result of the call to a virtual method returning a pointer.
  • __virtual_this: this pointer computed when calling a virtual method.
  • __virtual_tmp: virtual method table cell used when calling a virtual method. The name of the called function is added to the end of the name of the temporary
Generated contracts

Function contracts will be automatically generated for non-static class methods, in order to require validity of the this pointer as a precondition. Copy and move constructors will also grow separated annotations since they are expected to operate on separate objects.

These contracts will be added to user-provided contracts, if any.

Computation of these annotations can be disabled with the -no-cxx-generate-contracts option.

Builtins for C++ analysis

TrustInSoft Analyzer C++ introduces additional builtins to support the analysis of C++ code bases. These are listed in the relevant section of the builtin reference and explained below.

C++-friendly tis_make_unknown builtin

The analyzer provides the function tis_make_unknown for use in C code. The function takes a pointer and size, and sets the contents of the so-described area of memory to be unknown. This can be used to abstract over the contents of variables and objects (see e.g., Prepare the Analysis).

TrustInSoft Analyzer++ overloads the function with its C++-friendly variant. This variant has the same semantics, but a different signature:

void tis_make_unknown(void *, unsigned long);

Here, the overloaded tis_make_unknown function takes void * as its first argument, in contrast to the C-variant which takes char *. Passing in void pointers is more convenient in C++, because C++ can implicitly cast any other pointer type to void * (whereas char * would require the cast to be explicit).

Metadata-preserving tis_make_unknown builtin

When using tis_make_unknown on an object, the analyzer treats the entire indicated memory area as having unknown contents. This includes both the user-defined data, as well as metadata, such as a virtual table pointers, which should typically be preserved. If this information is not preserved, the object’s virtual methods and base classes become imprecise.

For this reason, TrustInSoft Analyzer++ provides another C++-specific variant of the tis_make_unknown builtin:

template <typename T> tis_make_unknown(T *);

The builtin is defined as a function template that takes a pointer to an object of any type T. The builtin abstracts all the user-defined contents of the provided object, but does not interfere with metadata added required by the analyzer to model polymorphism and inheritance.

Example. Consider the following program. Here, we define a class named Obj whose two members are a field x and a virtual method f which also returns the value of x. Then, in main, we instantiate an object of this class and use tis_make_unknown to set the value of the object’s field to be unknown. We then use tis_show_each to print out the values of x and the result of the call to method f:

#include <tis_builtin.h>

struct Obj {
  int x;
  virtual int f() { 
    return x; 
  }
};

int main() {
  Obj obj = {};
  tis_make_unknown(&obj, sizeof obj);
  
  tis_show_each("obj.x", obj.x);
  tis_show_each("obj.f()", obj.f());
  
  return 0;
}

When we analyze this program, the analyzer shows the value of x, as expected, but subsequently it also raises an alarm indicating that the program tried to dereference an invalid pointer. The analyzer emits the alert because we used the variant of tis_make_unknown that also sets virtual method table pointers to unknown.

$ tis-analyzer++ -val poly.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
tests/tis-user-guide/man/tis-analyzer-plusplus/poly.cpp:15:[kernel] warning: pointer arithmetic: assert \inside_object_or_null((void *)obj.__tis_pvmt);
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])

Instead of using the variant of tis_make_unknown (with two arguments) that overwrites the object’s metadata, we should modify the program to use the template function variant of tis_make_unknown (with just one argument) to preserve object metadata while setting the object’s field to unknown:

  tis_make_unknown(&obj);

When we analyze this program now, it shows the expected (unknown) values of x and result of calling f, but does not emit an alert, meaning tis_make_unknown did not clear the object’s virtual method table pointer.

$ tis-analyzer++ -val poly2.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])

Example. The following program presents a situation, where class Obj does not have members of its own, but it inherits field x from class Base via virtual inheritance. Then, the object obj of class Obj is instantiated in main and its members’ values are set to be unknown via tis_make_unknown. Then, we show the value assigned to field x inherited by object obj from class Base:

#include <tis_builtin.h>

struct Base {
  int x;
};

struct Obj: virtual Base {};

int main() {
  Obj obj = {};
  tis_make_unknown(&obj, sizeof obj);
  
  tis_show_each("obj.x", obj.x);
  
  return 0;
}

As in the example above, when running the analyzer on the program, we find that an alarm was raised, and again because this variant of tis_make_unknown sets the virtual method table pointers to unknown, but the virtual method table is also used in virtual inheritance.

$ tis-analyzer++ -val virt.cpp
tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: pointer arithmetic:
                  assert \inside_object_or_null((void *)obj.__tis_vbases_ptr);
tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: out of bounds read. assert \valid_read(obj.__tis_vbases_ptr+0);
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])

Hence, we should modify this program to the variant of tis_make_unknown that will preserve the pointer within obj to the virtual methods table with which it is associated:

  tis_make_unknown(&obj);

This removes the alert and produces the expected (unknown) value fo obj.x:

$ tis-analyzer++ -val virt2.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])

Tip

In cases where a class does not depend on a virtual methods table (i.e., it has no virtual inheritance and not virtual methods), both variants of tis_make_unknown are equivalent and can be used interchangeably.

GoogleTest support

You can run your tests that use GoogleTest framework with TrustInSoft Analyzer++ in a few steps.

GoogleTest User’s Guide : https://google.github.io/googletest/

Options

TrustInSoft Analyzer++ provides 2 options in order to activate GoogleTest support:

  • Choose the -gtest option if you wrote the entry point of your tests yourself.
  • Choose the -gtest-main option (as opposed to the -gtest option) if your tests use the default GoogleTest entry point (from gtest_main library).

More details about gtest_main can be found at https://google.github.io/googletest/primer.html#writing-the-main-function

Configuration

Specify your own sources, headers and preprocessing options as you would do for any other analysis (see Prepare the Sources).

By providing the -gtest option (or the -gtest-main option) to TrustInSoft Analyzer++, the analyzer will pull in all GoogleTest source files and headers for you. Thus you do not have to list them in your analysis configuration files.

As an example, let us assume that you are testing a software module called module1, and that you have gathered your tests that use GoogleTest framework in a tests subdirectory.

|-- module1
|   |-- include
|   |   ...
|   |-- src
|   |   |-- component1.cc
|   |   ...
|   |-- tests
|   |   |-- component1_unittest.cc
|   |   ...
|-- my_analysis
|   |-- mod1_comp1_unittest.json
|   ...

For instance mod1_comp1_unittest.json would look like

{
  "name": "mod1_comp1_unittest",
  "prefix_path":"../module1/",
  "files": [
    "tests/component1_unittest.cc",
    "src/component1.cc"
  ]
}

Note that you do not need to add the gtest \*.cc files to the "files" list.

Running the analysis

Next run the analysis with both the --interpreter option (see Getting Started) and the -gtest option (or the -gtest-main option).

tis-analyzer++ --interpreter -gtest -tis-config-load path/to/<my_unit_test>.json

Note: We provided the option -gtest directly on the command line in order to highlight it, but you can also move it to the configuration file

"gtest": true
Limitations
  • Catching GoogleTest assertions is not supported (macro EXPECT_FATAL_FAILURE and EXPECT_NONFATAL_FAILURE)
  • Death tests are not supported (e.g. macro EXPECT_EXIT)

You should make sure by yourself that your tests do not use these features, as they are untested at the moment. Therefore, the analyzer will probably not do what you expect if you use these features, and neither will it specifically warn you that it does not support them.

For more details about the assertion macros provided by GoogleTest visit https://google.github.io/googletest/reference/assertions.html

Dealing with Special Features

This section gives some details about how to deal with special features needed to analyze some applications:

Caution

The tis-mkfs tool is only available in the commercial version of TrustInSoft Analyzer.

File System

The tis-mkfs utility helps to build C files that gives information about the file system in which the application is supposed to run. For more information, please refer to the tis-mkfs Manual.

Environment variables

The default initial environment for the analysis of a program is empty. In order to perform an analysis in a specific environment, it has to be populated from the user code using one of the two methods below.

Setting environment variables

The user may set some variables by calling setenv (or putenv) in the analysis entry point.

Example:

#include <stdlib.h>
extern int main (int argc, char * argv[]);

int tis_main (int argc, char * argv[]) {
  int r;
  r = setenv ("USER", "me", 1);
  if (r != 0) return 1;
  r = setenv ("HOME", "/home/me", 1);
  if (r != 0) return 1;
  r = setenv ("SHELL", "/bin/sh", 1);
  if (r != 0) return 1;
  return main (argc, argv);
}
Initializing the whole environment

Alternatively, the user may initialize the environ standard variable. This variable:

  • is a pointer to an array of pointers to valid C strings,
  • all the strings must be valid C strings of the form "variable=value",
  • the element following the last environment variable must be NULL (elements past this NULL value, if any, will not get accessed).

Example:

#include <stdlib.h>
extern int main (int argc, char * argv[]);
extern char **environ;

int tis_main (int argc, char * argv[]) {
  char *custom_environ[] = {
    "USER=me",
    "HOME=/home/me",
    "SHELL=/bin/sh",
    NULL
  };
  environ = custom_environ;
  return main (argc, argv);
}

Using the environment

Using one of the two methods above to initialize the environment, the following program can be analyzed:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char * argv[]) {
  char * user = getenv ("USER");
  if (user)
    printf ("USER value is '%s'\n", user);
  return 0;
}

The command line would be:

$ tis-analyzer tis_main.c main.c -val -main tis_main -slevel 10

And the following output can be observed:

...
USER value is 'me'
...
Advanced usage

The initial size of the array pointed by the environ variable is controlled by the TIS_NB_INIT_ENV_ELEMENTS macro. Its default value is set to 100, but may be changed by the user (using the -D option as usual) to avoid reallocations if the application needs more than 100 variables.

Moreover, to avoid losing precision, a local slevel is used inside the implementation of the environment related functions. It is controlled by the TIS_ENV_INTERNAL_SLEVEL macro. Its default value is already very large, but it can be increased by the user if it is still too low for a specific usage.

Recursive calls

Without any option, the analysis stops on recursive function calls and lets the user decide how to handle them by choosing either -val-clone-on-recursive-calls or -val-ignore-recursive-calls.

The -val-clone-on-recursive-calls option tells the analyzer to process the calls to recursive functions exactly as it they were calls to normal functions. The function body is copied and the copy is renamed with the prefix __tis_rec_<n> where <n> is the depth of the recursion. This means that the recursive call is analyzed precisely. This works up to the limit defined with the option -val-clone-on-recursive-calls-max-depth. When the limit is reached (or when the -val-clone-on-recursive-calls is not set), the contract of the recursive function is used. Usually an assigns clause is enough to have the expected semantics. If no contract is provided, the analyzer generates a simple one but it may be an incorrect contract: so it is very recommended to provide a contract to analyze such cases.

The -val-ignore-recursive-calls option tells the analyzer to use the contract of the function to handle the recursive calls. The contract may be generated as-if the max-depth option was reached as explained above.

Note that when the --interpreter option is used the -val-clone-on-recursive-calls is automatically set.

Memory leaks

TrustInSoft Analyzer provides two ways to detect memory leaks.

  • The first way is to use the built-in function tis_check_leak to print the list of the memory blocks that are allocated but not referenced by any other memory block anymore at the program point where the built-in is called.

    Example (leak.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    char * f(int v) {
      char *p = malloc(v);
      return p;
    }
    
    int main() {
      f(42);
      tis_check_leak();
      return 0;
    }
    

    The program above can be analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations leak.c
    

    we get the following result:

    tests/val_examples/leak.c:5:[value] allocating variable __malloc_f_l5 of type char [42]
            stack: malloc :: tests/val_examples/leak.c:5 (included from tests/val_examples/leak_test.c) <-
                   f :: tests/val_examples/leak.c:10 (included from tests/val_examples/leak_test.c) <-
                   main
    tests/val_examples/leak.c:11:[value] warning: memory leak detected for {__malloc_f_l5}
    

    Indeed, 42 bytes are allocated at line 5 in f, but since the pointer returned by f is lost in main, a memory leak is detected at line 11.

    In addition, the analyzer also prints the list of possibly leaked memory blocks (memory blocks that might not be referenced by any other memory block anymore), as shown in the following example.

    Example (leak_weak.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    char * f() {
        char *p = malloc(1);
        char *q = malloc(1);
        return tis_interval(0, 1) ? p : q;
    }
    
    int main() {
        char *r = f();
        tis_check_leak();
        return 0;
    }
    

    When the program is analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations leak_weak.c
    

    we get the following result:

    $ tis-analyzer -val -val-show-allocations leak_weak.c
     [...]
     leak_weak.c:5:[value] allocating variable __malloc_f_l5 of type char
              stack: malloc :: leak_weak.c:5 <- f :: leak_weak.c:11 <- main
     leak_weak.c:6:[value] allocating variable __malloc_f_l6 of type char
              stack: malloc :: leak_weak.c:6 <- f :: leak_weak.c:11 <- main
     [value] using specification for function tis_interval
     leak_weak.c:12:[value] warning: possible memory leak detected for {__malloc_f_l5, __malloc_f_l6}
    

    Indeed, when tis_check_leak is called at line 12, the value of variable r is { NULL ; &__malloc_f_l5 ; &__malloc_f_l6 }, which means the value of r can either be NULL, or the address of the memory block allocated at line 5, or the address of the memory block allocated at line 6. Thus, the memory blocks allocated at line 5 and line 6 might be leaked.

    In order to improve the precision, the program can be analyzed with the following command line:

    $ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
    

    in which case, the analyzer propagates the states where r points to NULL, __malloc_f_l5 and __malloc_f_l6 separately, thus, we get the following analysis result:

    $ tis-analyzer -val -val-show-allocations -val-split-return-function f:full -slevel 10 leak_weak.c
      [...]
      leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l5}
      leak_weak.c:13:[value] warning: memory leak detected for {__malloc_f_l6}
    

    It shows that in one path, __malloc_f_l5 is leaked, and in another path __malloc_f_l6 is leaked.

  • The second way of detecting memory leaks requires the user to be able to identify two points in the target program such that when the execution reaches the second point, all the memory blocks that have been allocated since the execution was at the first point have been freed. The procedure is then to insert a call to a builtin that lists all the dynamically allocated blocks in each of the two points. If the lists printed by the two builtin calls at the two points match, it means that every block that was allocated after the first point was freed before the second point was reached.

    In “interpreter mode”, in which the analyzer follows a single execution path, the tis_show_allocated builtin can be used to print the lists of allocated blocks.

    Example (leak_interpreter.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    void f (void) {
        char * p1 = malloc (10);
        char * p2 = malloc (10);
        char * p3 = malloc (10);
        char * p4 = malloc (10);
        p1 = p4;
        p4 = p2;
        free (p2);
        free (p1);
    }
    int main (void) {
        char * p = malloc (10);
        tis_show_allocated ();
        /* all the memory blocks allocated in function f should be freed */
        f ();
        tis_show_allocated ();
        free(p);
    }
    

    When the program above is analyzed with the following command line:

    $ tis-analyzer --interpreter -val leak_interpreter.c
    

    we get the following result:

    $ tis-analyzer --interpreter -val leak_interpreter.c
      [...]
      leak_interpreter.c:16:[value] remaining allocated variables:
         __malloc_main_l15
      leak_interpreter.c:19:[value] remaining allocated variables:
         __malloc_main_l15, __malloc_f_l5, __malloc_f_l7
    

    The second call to tis_show_allocated at line 19 shows that after the function call f () at line 18, two more memory block respectively allocated at line 5 and line 7 exist in the memory state since the first call to tis_show_allocated. Thus, we know that function f causes a memory leak.

    In “analyzer mode”, the first point may be visited by the analyzer several times, for different memory states, corresponding to different execution paths in the program. The memory states that reach the second point should be matched to the memory state at the first point that they correspond to. For this purpose, the tis_allocated_and_id and tis_id builtins can be used. The tis_id builtin allows to give a unique “id” to each memory state and tis_allocated_and_id builtin can be used to print, in addition to the list of allocated blocks, the value of the “id” so as to allow states to be identified.

    Example (tis_show_allocated_and_id.c):

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #include <stdlib.h>
    #include <tis_builtin.h>
    
    int main(){
    
        char *t[7];
        int n = tis_interval_split(0, 1);
    
        // "before" point
        unsigned long long my_id = tis_id();
        tis_show_allocated_and_id("before", my_id);
    
        t[n] = malloc(1);
        if (!t[n]) goto leave1;
        t[n+1] = malloc(1);
        if (!t[n+1]) goto leave2;
        t[n][0] = 'a';
        t[n+1][0] = 'b';
    leave2:
        free(t[n]);
    leave1:
    
        // "after" point
        tis_show_allocated_and_id("after", my_id);
    }
    

    When the program above is analyzed with the following command line:

    $ tis-analyzer -val -slevel 10 -val-show-allocations tis_show_allocated_and_id.c
    

    we get the following result:

    $ tis-analyzer -val -slevel 100 -val-show-allocations tis_show_allocated_and_id.c
      [...]
      tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:11:[value] Called tis_show_id({{ "before" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:13:[value] allocating variable __malloc_main_l13 of type char
       stack: malloc :: tis_show_allocated_and_id.c:13 <- main
      tis_show_allocated_and_id.c:15:[value] allocating variable __malloc_main_l15 of type char
       stack: malloc :: tis_show_allocated_and_id.c:15 <- main
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {0}):
        remaining allocated variables:__malloc_main_l15
      tis_show_allocated_and_id.c:24:[value] Called tis_show_id({{ "after" }}, {1}):
        remaining allocated variables:__malloc_main_l15
    

    The statement at line 7 causes two separate states in which variable n respectively has value 0 and 1. Then, the statement at line 10 assigns variable my_id to 0 in one state and to 1 in the other state. The call to tis_show_allocated_and_id("before", "my_id) at line 11 shows that in both “before” states where my_id is 0 and 1, there is no allocated memory block. The memory allocation statements at line 13 and line 15 cause three different paths: in one path, the allocation at line 13 fails and goto leave1 is taken at line 14, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, there is no allocated memory block in both “after” states; in one path, the allocation succeeds at line 13, but fails at line 15 and goto leave2 at line 16 is taken, then, the memory allocated at line 13 is freed at line 20, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, there is no allocated memory block left in both “after” states; in the other path, both the allocation at line 13 and at line 15 succeed, but only the memory block allocated at line 13 is freed at line 20, thus, when tis_show_allocated_and_id("after", my_id) at line 24 is reached, the memory block allocated at line 15 remains in both “after” states, that is a memory leak problem. From this example, we can also see that there might be several “after” states for each “before” state: in this case each “after” state much be checked against its corresponding “before” state for memory leaks.

Target Architecture

TrustInSoft Analyzer faithfully emulates the hardware features of the targeted platform: endianness, size of integer types, and alignment constraints.

The command -machdep help lists the supported architectures, and the command -machdep verbose shows a brief summary of the main characteristics of each supported architecture.

$ tis-analyzer -machdep help
[kernel] supported machines are: aarch64 aarch64eb apple_ppc_32 arm_eabi armeb_eabi
         gcc_aarch64 gcc_aarch64eb gcc_arm_eabi gcc_armeb_eabi gcc_mips_64
         gcc_mips_n32 gcc_mips_o32 gcc_mipsel_64 gcc_mipsel_n32 gcc_mipsel_o32
         gcc_ppc_32 gcc_ppc_64 gcc_rv32ifdq gcc_rv64ifdq gcc_sparc_32 gcc_sparc_64
         gcc_x86_16 gcc_x86_16_huge gcc_x86_32 gcc_x86_64 mips_64 mips_n32 mips_o32
         mipsel_64 mipsel_n32 mipsel_o32 ppc_32 ppc_64 rv32ifdq rv64ifdq sparc_32
         sparc_64 x86_16 x86_16_huge x86_32 x86_64 x86_win32 x86_win64
         (default is x86_32).

These supported platforms are:

  • x86_16
    An x86 processor in 16-bit mode, in “tiny” or “small” memory model (using 16-bit “near” pointers). The long double type is the 80387 80-bit native format.
  • x86_16_huge
    An x86 processor in 16-bit mode, in “large” or “huge” memory model (using 32-bit “far” pointers). The long double type is the 80387 80-bit native format.
  • x86_32
    An x86 processor in 32-bit mode. The long double type is the 80387 80-bit native format.
  • x86_win32
    An x86 processor in 32-bit mode, using the Win32 ABI.
  • x86_64
    An x86-64 processor in 64-bit mode. The long double type is the IEEE754 quad-precision type.
  • x86_win64
    An x86 processor in 64-bit mode, using the Win64 ABI. The long type is 32 bits.
  • ppc_32
    A 32-bit PowerPC processor. The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • ppc_64
    A 64-bit POWER processor. The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • apple_ppc_32
    Same as ppc_32, but forcing the “char” type to be signed, as done by the Apple toolchain (as opposed to the default PowerPC use of unsigned char type), and allowing gcc language extensions.
  • arm_eabi, armeb_eabi
    A 32-bit ARM processor, using the extended ABI (EABI). The char type is unsigned.
  • aarch64, aarch64eb
    A 64-bit ARMv8 processor (AArch64). The char type is unsigned; the long double type is the IEEE754 quad-precision type.
  • sparc_32
    A 32-bit SparcV8 processor.
  • sparc_64
    A 64-bit SparcV9 processor. The long double type is the IEEE754 quad-precision type.
  • mips_o32, mipsel_o32
    A 32-bit MIPS processor, using the old ABI (o32).
  • mips_n32, mipsel_n32
    A 64-bit MIPS processor, using the new 32-bit ABI (n32). The long double type is the IEEE754 quad-precision type.
  • mips_64, mipsel_64
    A 64-bit MIPS processor, using the 64-bit ABI (64 or n64). The long double type is the IEEE754 quad-precision type.
  • rv32ifdq
    A RISC-V processor using the RV32I ISA, with D, F and Q floating-point extensions. The long double type is the IEEE754 quad-precision type.
  • rv64ifdq
    A RISC-V processor using the RV64I ISA, with D, F and Q floating-point extensions. The long double type is the IEEE754 quad-precision type.

Unless otherwise specified in the list above, the characteristics of the fundamental types are:

  • char
    is 8 bits, and defaults to signed.
  • short
    is 16 bits.
  • int
    is 16 bits on 16-bit machdeps (x86_16, x86_16_huge), 32 bits otherwise.
  • long
    is 64 bits on 64-bit machdeps, 32 bits otherwise.
  • long long
    is 64 bits.
  • float
    is 32 bits (IEEE754 single-precision type).
  • double
    is 64 bits (IEEE754 double-precision type).
  • long double
    is identical to double.

With the exception of apple_ppc_32, x86_win32 and x86_win64, all these machdeps may be specified with the gcc_ prefix, in which case gcc language extensions are allowed, and the __int128 integer type is available on 64-bit machdeps.

The endianness of the supported architecture is specified as:

  • little endian
    • x86_16
    • x86_16_huge
    • x86_32
    • x86_64
    • x86_win32
    • x86_win64
    • arm_eabi
    • aarch64
    • mipsel_o32
    • mipsel_n32
    • mipsel_64
    • rv32ifdq
    • rv64ifdq
  • big endian
    • apple_ppc_32
    • ppc_32
    • ppc_64
    • armeb_eabi
    • aarch64eb
    • sparc_32
    • sparc_64
    • mips_o32
    • mips_n32
    • mip_64

To switch to another architecture quickly, one of the following options may be used:

  • -16 for gcc_x86_16
  • -32 for gcc_x86_32
  • -64 for gcc_x86_64
$ tis-analyzer -64 ...
New Architecture on the Fly

When the targeted architecture is not supported out-of-the-box, a new architecture corresponding to a specific target may be defined and dynamically loaded by TrustInSoft Analyzer.

A new plug-in is defined with the values of the types for the targeted architecture. Create a new file custom_machdep.ml with the following content and edit the necessary values:

open Cil_types

let mach =
  {
    version = "foo";
    compiler = "bar";
    sizeof_short = 2;
    (* __SIZEOF_SHORT *)
    sizeof_int = 4;
    (* __SIZEOF_INT *)
    sizeof_long = 4;
    (* __SIZEOF_LONG *)
    sizeof_longlong = 8;
    (* __SIZEOF_LONGLONG *)
    sizeof_int128 = 0;
    sizeof_ptr = 4;
    (* related to __INTPTR_T *)
    sizeof_float = 4;
    sizeof_double = 8;
    sizeof_longdouble = 12;
    sizeof_void = 1;
    sizeof_fun = 1;
    size_t = "unsigned long";
    (* __SIZE_T *)
    char16_t = "unsigned short";
    char32_t = "unsigned int";
    wchar_t = "int";
    (* __WCHAR_T *)
    ptrdiff_t = "int";
    (* __PTRDIFF_T *)
    max_align_t = "long double";
    (* __MAX_ALIGN_T *)
    alignof_short = 2;
    alignof_int = 4;
    alignof_long = 4;
    alignof_longlong = 4;
    alignof_int128 = 0;
    alignof_ptr = 4;
    alignof_float = 4;
    alignof_double = 4;
    alignof_longdouble = 4;
    alignof_str = 1;
    alignof_fun = 1;
    alignof_aligned = 16;
    pack_max = 16;
    char_is_unsigned = false;
    char_bit = 8;
    (* __CHAR_BIT *)
    const_string_literals = true;
    little_endian = false;
    (* __TIS_BYTE_ORDER *)
    has__builtin_va_list = true;
    __thread_is_keyword = true;
    has_int128 = false;
  }

let () =
  Stage.run_after_loading_stage (fun () ->
      Core.result "Registering machdep 'mach' as 'custom'";
      ignore
        (Machdeps.register_machdep
           ~short_name:"custom"
           ~cpp_target_options:[]
           mach ) )

Define a new header containing the values of the types. Create a new file __fc_custom_machdep.h with the following content:

/* skeleton of a real custom machdep header. */
#ifndef __TIS_MACHDEP
#define __TIS_MACHDEP

#ifdef __TIS_MACHDEP_CUSTOM

// __CHAR_UNSIGNED must match mach.char_is_unsigned
#undef  __CHAR_UNSIGNED
#define __WORDSIZE 32

// __CHAR_BIT must match mach.char_bit
#define __CHAR_BIT 8
// __SIZEOF_SHORT must match mach.sizeof_short
#define __SIZEOF_SHORT 2
// __SIZEOF_INT must match mach.sizeof_int
#define __SIZEOF_INT 4
// __SIZEOF_LONG must match mach.sizeof_long
#define __SIZEOF_LONG 4
// __SIZEOF_LONGLONG must match mach.sizeof_longlong
#define __SIZEOF_LONGLONG 8

// __TIS_BYTE_ORDER must match mach.little_endian
#define __TIS_BYTE_ORDER __BIG_ENDIAN

#define __TIS_SCHAR_MIN (-128)
#define __TIS_SCHAR_MAX 127
#define __TIS_UCHAR_MAX 255
#define __TIS_SHRT_MIN (-32768)
#define __TIS_SHRT_MAX 32767
#define __TIS_USHRT_MAX 65535
#define __TIS_INT_MIN (-__TIS_INT_MAX - 1)
#define __TIS_INT_MAX 8388607
#define __TIS_UINT_MAX 16777216
#define __TIS_LONG_MIN (-__TIS_LONG_MAX -1L)
#define __TIS_LONG_MAX 2147483647L
#define __TIS_ULONG_MAX 4294967295UL
#define __TIS_LLONG_MIN (-__TIS_LLONG_MAX -1LL)
#define __TIS_LLONG_MAX 9223372036854775807LL
#define __TIS_ULLONG_MAX 18446744073709551615ULL

#define __INT8_T signed char
#define __TIS_INT8_MIN __TIS_SCHAR_MIN
#define __TIS_INT8_MAX __TIS_SCHAR_MAX
#define __UINT8_T unsigned char
#define __TIS_UINT8_MAX __TIS_UCHAR_MAX
#define __INT_LEAST8_T __INT8_T
#define __TIS_INTLEAST8_MIN __TIS_INT8_MIN
#define __TIS_INTLEAST8_MAX __TIS_INT8_MAX
#define __UINT_LEAST8_T __UINT8_T
#define __TIS_UINTLEAST8_MAX __TIS_UINT8_MAX
#define __INT_FAST8_T __INT8_T
#define __TIS_INTFAST8_MIN __TIS_INT8_MIN
#define __TIS_INTFAST8_MAX __TIS_INT8_MAX
#define __UINT_FAST8_T __UINT8_T
#define __TIS_UINTFAST8_MAX __TIS_UINT8_MAX

#define __INT16_T signed short
#define __TIS_INT16_MIN __TIS_SHRT_MIN
#define __TIS_INT16_MAX __TIS_SHRT_MAX
#define __UINT16_T unsigned short
#define __TIS_UINT16_MAX __TIS_USHRT_MAX
#define __INT_LEAST16_T __INT16_T
#define __TIS_INTLEAST16_MIN __TIS_INT16_MIN
#define __TIS_INTLEAST16_MAX __TIS_INT16_MAX
#define __UINT_LEAST16_T __UINT16_T
#define __TIS_UINTLEAST16_MAX __TIS_UINT16_MAX
#define __INT_FAST16_T __INT16_T
#define __TIS_INTFAST16_MIN __TIS_INT16_MIN
#define __TIS_INTFAST16_MAX __TIS_INT16_MAX
#define __UINT_FAST16_T __UINT16_T
#define __TIS_UINTFAST16_MAX __TIS_UINT16_MAX

#define __INT32_T signed int
#define __TIS_INT32_MIN __TIS_INT_MIN
#define __TIS_INT32_MAX __TIS_INT_MAX
#define __UINT32_T unsigned int
#define __TIS_UINT32_MAX __TIS_UINT_MAX
#define __INT_LEAST32_T __INT32_T
#define __TIS_INTLEAST32_MIN __TIS_INT32_MIN
#define __TIS_INTLEAST32_MAX __TIS_INT32_MAX
#define __UINT_LEAST32_T __UINT32_T
#define __TIS_UINTLEAST32_MAX __TIS_UINT32_MAX
#define __INT_FAST32_T __INT32_T
#define __TIS_INTFAST32_MIN __TIS_INT32_MIN
#define __TIS_INTFAST32_MAX __TIS_INT32_MAX
#define __UINT_FAST32_T __UINT32_T
#define __TIS_UINTFAST32_MAX __TIS_UINT32_MAX

#define __INT64_T signed long long
#define __TIS_INT64_MIN __TIS_LLONG_MIN
#define __TIS_INT64_MAX __TIS_LLONG_MAX
#define __UINT64_T unsigned long long
#define __TIS_UINT64_MAX __TIS_ULLONG_MAX
#define __INT_LEAST64_T __INT64_T
#define __TIS_INTLEAST64_MIN __TIS_INT64_MIN
#define __TIS_INTLEAST64_MAX __TIS_INT64_MAX
#define __UINT_LEAST64_T __UINT64_T
#define __TIS_UINTLEAST64_MAX __TIS_UINT64_MAX
#define __INT_FAST64_T __INT64_T
#define __TIS_INTFAST64_MIN __TIS_INT64_MIN
#define __TIS_INTFAST64_MAX __TIS_INT64_MAX
#define __UINT_FAST64_T __UINT64_T
#define __TIS_UINTFAST64_MAX __TIS_UINT64_MAX

#define __INT_MAX_T __INT64_T
#define __TIS_INTMAX_MIN __TIS_INT64_MIN
#define __TIS_INTMAX_MAX __TIS_INT64_MAX
#define __UINT_MAX_T __UINT64_T
#define __TIS_UINTMAX_MAX __TIS_UINT64_MAX

// __INTPTR_T  must match mach.sizeof_ptr
#define __INTPTR_T __INT32_T
#define __TIS_INTPTR_MIN __TIS_INT32_MIN
#define __TIS_INTPTR_MAX __TIS_INT32_MAX
#define __UINTPTR_T __UINT32_T
#define __TIS_UINTPTR_MAX __TIS_UINT32_MAX

// __PTRDIFF_T must match mach.ptrdiff_t
#define __PTRDIFF_T int
// __MAX_ALIGN_T  must match mach.max_align_t
#define __MAX_ALIGN_T long double
// __SIZE_T must match mach.size_t
#define __SIZE_T unsigned int
#define __SSIZE_T int
#define __TIS_PTRDIFF_MIN __TIS_INT_MIN
#define __TIS_PTRDIFF_MAX __TIS_INT_MAX
#define __TIS_SIZE_MAX __TIS_UINT_MAX
#define __TIS_SSIZE_MAX __TIS_INT_MAX

// __WCHAR_T  must match mach.wchar_t
#define __WCHAR_T int
#define __TIS_WCHAR_MIN __TIS_INT_MIN
#define __TIS_WCHAR_MAX __TIS_INT_MAX
#define __WINT_T long long int
#define __TIS_WINT_MIN __TIS_LLONG_MIN
#define __TIS_WINT_MAX __TIS_LLONG_MAX
#define __WCTRANS_T long long int
#define __WCTYPE_T long long int

#define __SIG_ATOMIC_T volatile int
#define __TIS_SIG_ATOMIC_MIN __TIS_INT_MIN
#define __TIS_SIG_ATOMIC_MAX __TIS_INT_MAX

// Common machine specific values (PATH_MAX, errno values, etc) for Linux
// platforms, usually applicable anywhere else.
#include "__fc_machdep_linux_gcc_shared.h"

#else
#error "I'm supposed to be included with __TIS_MACHDEP_CUSTOM macro defined"
#endif
#endif

NB: The previous content is not close to a real architecture but is given as an example of possibilities.

Warning

It is important for the data defined in the two files above to have compatible values.

The new architecture may now be tested:

#include "limits.h"

int main(void)
{
    return INT_MAX;
}

To analyze it with TrustInSoft Analyzer, load the plug-in that defines the custom machdep and then add the option -D __TIS_MACHDEP_CUSTOM.

$ tis-analyzer -load-script custom_machdep.ml -I . -D __TIS_MACHDEP_CUSTOM -machdep custom -val test.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  __retres ∈ {8388607}
Additional target architecture tuning for floating-point behavior

The command-line options -all-rounding-modes-constants and -all-rounding-modes change the interpretation of floating-point constants and computations. The default, when both options are unset, is to assume a strict IEEE 754 platform where float is mapped to the IEEE 754 binary32 format, double` is mapped to binary64, and the rounding mode is never changed from its nearest-even default. The following deviations from strict IEEE 754 behavior can be taken into account in TrustInSoft Analyzer:

  • The C99 and C11 standards allow excess precision for floating-point computations when FLT_EVAL_METHOD is defined to a value other than 0 by the compiler.
  • Some compilers do not follow the C99 standard’s specifications with respect to excess floating-point precision: they set FLT_EVAL_METHOD to 2 or 0, but actually produce floating-point results inconsistent with these settings.
  • The C99 and C11 standards allow floating-point expression contractions with #pragma STDC FP_CONTRACT ON. Some C compilers are taking the path of enabling this by default (ref: https://reviews.llvm.org/D24481 ).
  • Some target architectures flush subnormals to zero.

The user who applies TrustInSoft Analyzer to C programs containing significant floating-point computations is invited to open a ticket in the support site with details of the compiler and architecture.

Additional tuning for the typing of integer constants

An unfortunate choice in the C89 standard has led, when the C99 standard was published, to incompatibilities for the types assigned to integer constants. The problem is compounded by some compilers’ eagerness to provide extended integer types. The type of integer constants can, though the usual arithmetic conversions, influence the results of integer computations.

By default, TrustInSoft Analyzer types integer constants the same way a C99-compliant compiler would, and invites the user to pick a choice if a signed integer constant that cannot be represented as a long long is encountered in the target program. Since long long is at least 64-bit, this does not happen unless an integer constant in the program is larger than 9,223,372,036,854,775,807.

The command-line option -integer-constants c89strict can be used to select the C89 behavior. On an ILP32 compiler following the C89 standard, the following program returns 1 with res = 1, whereas it returns 2 with res = 2 when compiled with a C99 compiler:

int main(void)
{
    int res;
    if (-3000000000 < 0)
        res = 1;
    else
        res = 2;
    return res;
}
$ tis-analyzer -val integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  res ∈ {1}
$ tis-analyzer -val -integer-constant c89strict integer_constant.c
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  res ∈ {2}

Caution

The -val-use-spec option is only available in the commercial version of TrustInSoft Analyzer.

Whenever it is necessary to analyze assembly code without using the -val-use-spec option to provide its specification, model the assembly code with equivalent C code instead.

Assembler Code

Assembly code is accepted by the tool, but most of the analyzers are not able to understand it.

Assembly statements appear in the statements.csv information file (see Information about the Statements) with the keyword asm in the kind column. It can be the case that these statements are not reachable in the studied context, so they can be safely ignored. If they are reachable, the value analysis ignores them, but it warns when it has to do so with the message:

[value] warning: assuming assembly code has no effects in function xxx

It is fine if the assembly statements have no effect, or if the effect can be ignored, but it is most likely not the case!

The simplest way to handle assembly code in the value analysis is to write a specification for the functions that enclose such statements and use the option -val-use-spec to use them instead of the function bodies.

If a function is composed of other statements besides the assembly ones, WP can be used to check the specification. Because WP does not understand assembler either, the assembly statements have to be specified using a statement contract. These cannot be verified against assembler, but are used to verify the rest of the function.

Example:

//@ requires pre: n >= 0; ensures post: \result == n * 3 * x + 10;
int asm_fun(int x, int n)
{
    int res = 0;

    /*@ loop assigns i, res;
        loop invariant l1: i <= n;
        loop invariant l2: res == i * 3 * x;
    */
    for (int i = 0; i < n; i++) {
        int y;
        //@ assigns y; ensures y == 3 * x;
        asm ("leal (%1,%1,2), %0" // x + x * 2 --> == 3 * x
             : "=r" (y)
             : "r" (x)
            );
        res += y;
    }
    return res + 10;
}

The specification of asm_fun has to be used during value analysis using the -val-use-spec options. But this specification can be verified using WP assuming the assembler statement properties are correct, which is a smaller hypothesis.

Caution

The -absolute-valid-range option is only available in the commercial version of TrustInSoft Analyzer.

Physical addresses

In general, there is an expectation that C programs exhibit behaviors that are independent from the values of addresses at which variables are located in memory. However, embedded code, and other code interfacing with hardware, routinely accesses registers, physical memory, and peripheral devices by interacting with specific memory addresses. These addresses are fixed and dictated by hardware architecture, so programs access them either directly by reading or writing memory at an absolute address, or by having a linker pin specific variables to specific memory addresses.

The requirement to inspect addresses is not limited to interfacing with peripherals either. By default, the address size of a variable depends on the architecture for which the program is compiled and on which it runs. Programmers sometimes take advantage of specific assumptions about the values of addresses to optimize operations on pointers. For example, code may assume that addresses of variables or functions are allocated within the first 4GiB of available memory in practice, and use this knowledge to store the addresses of such entities as 32-bit, not 64-bit integers. It is also often the case that code expects objects in memory to adhere to a specific alignment and relies on that fact to perform optimizations such as pointer tagging.

TrustInSoft Analyzer generally assumes that each variable is located at one out of all possible addresses in memory, but does not assume any specific address, and considers dereferencing absolute addresses to be invalid. In order to work with code that relies on absolute memory addresses, the user must provide additionally configuration to the analyzer. This guide describes how to perform such configuration:

Valid memory accesses

A programmer can make a mistake, where they treat a numerical value as an address and dereference it, trying to access its contents. The address may be invalid and cause an access violation. But even if the access does not trap, it could be a mistake. The analyzer should warn about it and the programmer should rectify it.

On the other hand, programs interfacing with hardware commonly access its resources though memory-mapped I/O (MMIO), where the program exchanges information with the hardware in question through reading from and writing to specific addresses in memory. In such a case, the programmer may use an absolute address directly. For example:

#define HW_REGISTER 0x4000

void main() {
  printf("contents: %x", *((unsigned int *) HW_REGISTER));
}

The value at such an address is managed in some way by hardware and the programmer knows it is safe to access it on that basis. So, when the address is dereferenced, this is both deliberate and safe.

Whether the access through an absolute address is purposeful or accidental, they appear to be the same from the point of view of TrustInSoft Analyzer. So, conservatively, it defaults to treating both as invalid operations and emitting an alarm (typically, a memory access alarm when dereferenced or a a pointer arithmetic alarm when indexed).

This section shows how to tell the analyzer that accesses to a specific addresses are purposeful and should be treated as valid operations.

One way of doing this is to define a range of valid addresses, allowing those addresses to be dereferenced using their raw numerical values. Alternatively, it is possible to provide variable declarations and use them instead of the absolute address directly during the analysis, or to place these variables at specific addresses, so that accesses to the absolute address correspond to accesses to that variable. These three techniques are summarized in the following table and discussed in detail below.

  define absolute valid range introduce variables (unconstrained address) introduce variables (constrained address)
preserves specific address yes no yes
requires code modification no yes no
allows disjoint ranges of valid addresses no yes yes
allows designating as read-only no yes yes
allows dereferencing absolute address yes no yes
volatile behavior granularity entire valid range per variable per variable

Example The following program shows the basic problem that these techniques solve. It portrays an example use of absolute addresses to communicate with a peripheral device via a set of registers. The GPIO_MODE register is located at the absolute address 0x40020000 and is used as an unsigned integer (via the macro VALUE). It is used to set the operating mode of the device. Then, GPIO_DATA_A, GPIO_DATA_B, and GPIO_DATA_C represent a set of ports, each accessed as a 4-byte array (via the macro BYTE with an index) and found in memory at addresses 0x40020014, 0x40020018, and 0x4002001c, respectively. When the program is executed, it sets the value of GPIO_MODE to the value `0x19, then reads the values of four bytes from GPIO_DATA_A and, depending on each read values, writes either 1 or 0 to the corresponding byte in GPIO_DATA_B. Before finishing, the program calls tis_show_each to display the absolute addresses represented by each constant, and the analyzer’s view of the values at each of these addresses.

#include <tis_builtin.h>

#define GPIO_MODE   0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i < 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_B);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1), 
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

When analyzing this program, the analyzer emits an alarm reporting an invalid memory access. Without the context of the peripheral device, writing to unallocated memory is perceived as an undefined behavior by the analyzer.

$ tis-analyzer physical_register_abs.c -quiet -print -print-filter main
void main(void)
{
  /*@ assert Value: mem_access: \valid((unsigned int *)0x40020000); */
  *((unsigned int *)0x40020000) = (unsigned int)0x19;
  {
    int i;
    i = 0;
    while (i < 4) {
      if ((int)*((unsigned char *)0x40020014 + i) == 0) *((unsigned char *)0x40020018 + i) = (unsigned char)1;
      else *((unsigned char *)0x40020018 + i) = (unsigned char)0;
      i ++;
    }
  }
  tis_show_each("MODE", 0x40020000, *((unsigned int *)0x40020000));
  tis_show_each("DATA_A", 0x40020018);
  tis_show_each("*DATA_A", (int)*((unsigned char *)0x40020014 + 0),
                (int)*((unsigned char *)0x40020014 + 1),
                (int)*((unsigned char *)0x40020014 + 2),
                (int)*((unsigned char *)0x40020014 + 3));
  tis_show_each("DATA_B", 0x40020018);
  tis_show_each("*DATA_B", (int)*((unsigned char *)0x40020018 + 0),
                (int)*((unsigned char *)0x40020018 + 1),
                (int)*((unsigned char *)0x40020018 + 2),
                (int)*((unsigned char *)0x40020018 + 3));
  tis_show_each("DATA_C", 0x4002001c);
  tis_show_each("*DATA_C", (int)*((unsigned char *)0x4002001c + 0),
                (int)*((unsigned char *)0x4002001c + 1),
                (int)*((unsigned char *)0x4002001c + 2),
                (int)*((unsigned char *)0x4002001c + 3));
  __tis_globfini();
  return;
}
Configuring a valid address range

The user can configure the analyzer to treat a range of addresses as valid by using its absolute-valid-range option. Then, the analyzer considers all accesses to those addresses as valid. The user can use this option to specify a single range of addresses representing one or more contiguous logical objects.

The user can specify a valid range via the command-line option -absolute-valid-range. The option takes a single argument consisting of two addresses separated by a dash, here represented by FIRST and LAST:

$ tis-analyzer -absolute-valid-range FIRST-LAST …

Alternatively, the user can specify the same option within a JSON analysis configuration file using absolute-valid-range. The option’s argument is a string that contains two addresses separated by a dash (FIRST and LAST)

{
  "val": true,
  "absolute-valid-range": "FIRST-LAST"
}

Both the FIRST and LAST addresses are integer values expressed as either hexadecimal, octal, binary, or decimal numbers in C literal notation. Here, the command line options specify a valid range from 0x4000 to 0x4007 using all available notations:

$ tis-analyzer -absolute-valid-range 0x4000-0x4007 …
$ tis-analyzer -absolute-valid-range 0X4000-0X4007 …

$ tis-analyzer -absolute-valid-range 0o40007-0o40007 …
$ tis-analyzer -absolute-valid-range 0O40007-0O40007 …

$ tis-analyzer -absolute-valid-range 0b100000000000000-0b100000000000111 …
$ tis-analyzer -absolute-valid-range 0B100000000000000-0B100000000000111 …

$ tis-analyzer -absolute-valid-range 16391-16398 …

Warning

Octal number notation

The absolute-valid-range option uses a different notation than C when expressing octal numbers. The analyzer interprets addresses passed to absolute-valid-range prefixed with only a leading zero as decimal.

The range is inclusive, meaning that both FIRST and LAST addresses are considered valid. This means that the notation above specifies the following eight valid addresses: 0x4000, 0x4001, 0x4002, 0x4003, 0x4004, 0x4005, 0x4006, and 0x4007. If the value of LAST is less than the value of FIRST, the analyzer sets the range of absolute addresses to be empty.

Diagram of a section of memory between the addresses 0x3ffc and 0x401c with addresses between 0x4000 and 0x4008 marked as an absolute valid range. The address 0x4000 is marked as FIRST and 0x4007 as LAST. The diagram excludes 0x4008 from the valid range.

There can only be a single contiguous valid range defined. If the user sets multiple absolute-valid-range options, the analyzer uses only the last (rightmost) one.

Tip

Introspecting addresses

TrustInSoft Analyzer always attempts to display addresses of variables as symbols referring to the variable. E.g.:

int v;
tis_show_each("&v", &v);
[value] Called tis_show_each({{ "&v" }}, {{ &v }})

When dealing with constrained variables this representation might not always be useful. So, the analyzer provides the builtin tis_force_ival_representation, which coerces the symbolic representation of the address into its numerical value, here, any possible 4-byte aligned 32-bit pointer value:

int v;
tis_show_each("&v", tis_force_ival_representation(&v));
[value] Called tis_show_each({{ "&v" }}, [4..4294967288],0%4)

Instead of using tis_force_ival_representation as an argument to tis_show_each, the user can also use the function tis_show_ival_representation which combines both: it displays values the same as tis_show_each but it applies tis_force_ival_representation to each of its arguments.

int v;
tis_show_ival_representation("&v", &v);
[value] Called tis_show_ival_representation({{ "&v" }}, [4..4294967288],0%4)

The tis_show_each builtin displays integers in decimal representation by default. However, it is convenient to have addresses displayed using hexadecimal representation instead. The user can do this by setting the big-ints-hex configuration option to specify that the analyzer should display values larger than the given threshold using hexadecimal notation. The option can be set via the command-line:

$ tis-analyzer -val -big-ints-hex 0xff …

Or through a JSON configuration file:

{
   "val": true,
   "big-ints-hex": "0xff"
}

Run with the configuration set like the above, the following example prints addresses as hexadecimal values:

int v;
tis_show_ival_representation("&v", &v);
[value] Called tis_show_ival_representation({{ "&v" }}, [4..0xFFFFFFF8],0%4)

Alternatively, it may sometimes be convenient to use printf to display addresses using a specific representation:

int v;
printf("&v = 0x%lx",  tis_force_ival_representation(&v));

Using printf is only possible if the address resolves to a single, precise value. Otherwise, the analyzer issues a warning and prints an empty string:

[value] warning: using address as integer in printing function. This may cause
        the program to behave nondeterministically when executed
&v = 0x

However, printf can be useful with absolute and constrained address values:

int *ptr = 0x8000;
printf("ptr = 0x%lx",  tis_force_ival_representation(ptr));
ptr = 0x8000

Example The example at the start of the section assumes that the data at absolute addresses GPIO_MODE (0x40020000-0x40020003), GPIO_DATA_A (0x40020014-0x40020017), GPIO_DATA_B (0x40020018-0x4002001b), and GPIO_DATA_C (0x4002001c-0x4002001f) can be safely accessed. The user can convey this to the analyzer by specifying a valid address range from 0x40020000 to 0x4002001f via the absolute-valid-range option. Then, the analyzer manages to complete the analysis. The analyzer runs with the slevel of 10 to conveniently simplify the values displayed by tis_show_each and with big-ints-hex set to display all addresses in the program in hexadecimal representation (see introspecting addresses). When run like that, the analyzer deduces the following values at the absolute memory locations:

$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
Pitfall: external modifications to memory

The analyzer initially assumes that memory at absolute addresses contains some unknown value. The memory at those addresses will retain values written to it by the analyzed program and not change independently. However, if the addresses represent input ports that will be written to by a peripheral device, their contents might indeed change independently of what the source code of the analyzed program suggests.

The analyzer does not model this by default. Instead, the user must specify that the valid address range is volatile via the volatile-globals option. This topic is covered in detail in the guide to volatile variables.

Warning

If the value pointed to by an absolute address represents data that can be modified by a peripheral, it should be analyzed as volatile to preserve soundness. See Volatile variables for details.

Example Consider the example at the start of the section again. When the code is executed with a specified valid range of addresses, the analyzer shows that the memory at address GPIO_DATA_B retains the value that the program wrote to it—either 0 or 1.

$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])

However, if GPIO_DATA_B is a hardware port, the expectation is that the values will be consumed or otherwise modified by the peripheral and independently from the code of the program. This is indicated by specifying the option -volatile-globals with the argument NULL (indicating the entire range of valid absolute addresses, see Volatile variables). At this point, the analyzer will assume that the values at absolute addresses cannot be determined solely by observing the behavior of the program. Therefore, it reports that the value at GPIO_DATA_B can contain any possible value within the range allowed by its type, even after it was written to.

$ tis-analyzer physical_register_abs.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f -volatile-globals NULL
[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, [0..0xFFFFFFFF])
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])

Note, however, that since the analyzer treats the entire valid range of absolute addresses as volatile, it reports that GPIO_MODE can contain any possible value as well. Currently, there is no mechanism to specify only parts of the valid absolute range to act as volatile using the absolute-valid-range option. If more fine-grained specification is required, the user is advised to replace absolute addresses with equivalent variable declarations, as described in detail in a separate section below.

Pitfall: multiple logical objects

The analyzer only allows for the definition of a single contiguous absolute valid range. If a code base operates on a peripheral with multiple separate registers, they can be logically represented within a single absolute valid range, as long as they constitute a contiguous memory region.

For example, if a program communicates with a peripheral via two 4-byte ports, one available at address 0x4000 and the other at 0x4004, the user can specify an absolute valid range from 0x4000 to 0x4007 that encompasses both ports.

Diagram of a section of memory between the addresses 0x3ffc and 0x401c with addresses between 0x4000 and 0x4008 marked as an absolute valid range. The addresses between 0x4000 and 0x4003 are marked as port a and the addresses from 0x4004 to 0x4007 are marked as port b.

However, if a program communicates with a peripheral through a 4-byte port address 0x4000 and a 2-byte port at address 0x4006, but the memory addresses 0x4004 and 0x4005 are still invalid, the user cannot specify an absolute valid range that encompasses both ports.

If such absolute valid range were to be defined, the analyzer is not capable of distinguishing between accesses within the logical objects that can be safely accessed and the gap memory between them that should not be. Since this area of memory is defined as valid, accesses into the gap will also be treated as valid.

Diagram of a section of memory between the addresses 0x3ffc and 0x401c with addresses between 0x4000 and 0x4008 marked as an absolute valid range. The addresses between 0x4000 and 0x4003 are marked as port a and the addresses from 0x4006 to 0x4007 are marked as port b. The addresses from 0x4004 to 0x4005 are marked in red as invalid.

If the code requires a discontinuous area of valid addresses, the user is advised to replace absolute addresses with equivalent variable declarations, as described in detail in a separate section below.

Pitfall: boundaries of logical objects

Even if the objects within an absolute valid range are contiguous, the user should be aware that the analyzer cannot detect the boundaries between them. Since these logical objects are not expressed in concrete terms (e.g., as variables), the assumptions about their type and size remains implicit.

That is, if the program attempts to access one of such object at an offset that is out of bounds of that object, the analyzer will not emit an alarm, so long as the offset falls within the boundaries of the valid range.

Diagram of a section of memory between the addresses 0x3ffc and 0x401c with addresses between 0x4000 and 0x4008 marked as an absolute valid range. The addresses between 0x4000 and 0x4003 are marked as port a and the addresses from 0x4004 to 0x4007 are marked as port b. A snippet of code points from ((char \*) PORT_A) + 6 to addresses 0x4000 and 0x4006.

Compare this with a situation where instead of an absolute address range, the memory is described as variables. In such a case, the analyzer is aware of the size of the variables at the base address, so it can determine whether an offset falls outside of the variable and emit an alarm if it does.

Diagram of a section of memory between the addresses 0x3ffc and 0x401c with addresses between 0x4000 and 0x4008 marked as variables port_a (starting at 0x4000) and port_b (starting at 0x4004). A snippet of code points from port_a[4] + 6  to addresses 0x4000 and 0x4006.

Indeed, the user can avoid this pitfall by replacing the absolute valid range with a series of variables using the technique presented in the next section.

Example Consider a program analogous to the previous example, where absolute addresses represent ports or registers of a peripheral device: GPIO_MODE (0x40020000-0x40020003), GPIO_DATA_A (0x40020014-0x40020017), GPIO_DATA_B (0x40020018-0x4002001b), and GPIO_DATA_C (0x4002001c-0x4002001f) and can be safely accessed. However, here the code is modified to index GPIO_DATA_A and GPIO_DATA_B out of their respective presumed bounds.

#include <tis_builtin.h>

#define GPIO_MODE   0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i <= 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;  
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_A);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1), 
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

When analyzed with a correctly configured absolute valid range, the example does not cause an alarm to be emitted. Instead, the analyzer has no choice but to assume that all accesses to valid absolute addresses were valid and purposeful. However, as a result, the writes to GPIO_DATA_B at index 5 actually write to GPIO_DATA_C at index 0, which is suspect.

$ tis-analyzer physical_register_oob.c -val -slevel 10 -big-ints-hex 0x40000000 -absolute-valid-range 0x40020000-0x4002001f
[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020014})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, {0; 1}, [0..255], [0..255], [0..255])
Interpreting absolute addresses as external variables

A better way of informing the analyzer that accesses to absolute memory addresses are valid is to declare a set of variables equivalent to the data represented by such absolute addresses and to place those variables at those exact absolute addresses. This is an efficient way of concretizing how each of the memory addresses should be interpreted for the purposes of analysis.

The least invasive way to inject such equivalent variables into the analyzed program is to introduce them in an auxiliary file containing their definitions and to provide that file to the analyzer as one of the analyzed input files. If the variables are constrained to the same absolute memory addresses as used by the original program, the analyzer treats the accesses using these raw address as accesses to the corresponding variables. This allows the analyzer to distinguish whether a given access through an absolute address is purposeful or invalid. In addition, the analyzer can also then check if the address is interpreted in accordance with the type of the declared variables.

For example, the ADDR macro in the following code might represent some number of consecutive bytes of data, whose length is expressed by ADDR_SZ.

#define ADDR_SZ 8
#define ADDR 0x4000

The expectation is that this address is therefore used as if it were an 8-element array of bytes. In that case, it can be expressed as a variable, as an eight-element array containing elements of type unsigned char. The variable is qualified as extern to indicate that it is not initialized within this program:

extern unsigned char byte_array[ADDR_SZ];

The analyzer allows for variables to be pinned to specific addresses (e.g., to model the behavior of a linker script). The user can do so by attaching the tis_address attribute to a variable and by specifying an address via the attribute’s argument. For the example above, pinning byte_array to the address defined through ADDR, is done as follows:

extern unsigned char byte_array[ADDR_SZ] __attribute__(tis_address(ADDR));

When the code of the program is analyzed and the contents of the address 0x4000 represented by ADDR are dereferenced, the analyzer accesses byte_array. This means that the dereferences found in the original program do not have to be modified to take advantage of the defined variable.

Since the analyzer now associates the addresses from 0x4000 to 0x4007 with byte_array, it also ascribes byte_array’s type to them. This means it enforces byte_array’s boundaries when the memory at 0x4000-0x4007 is accessed and emits and alarm if it detects a violation (regardless of whether the neighboring addresses are valid). The analyzer also respects the variable’s type qualifiers, like volatile or const, when accessing the associated addresses.

Tip

Volatile variables

Variables may be declared and subsequently analyzed as volatile to indicate that their values may change independently from the analyzed program, for instance, by a peripheral device. Similarly, variables may be declared as const volatile if they may only be read, but their values may change from read to read. See Volatile variables for more detail.

This variable definition can be placed in a separate file from the one containing the definition of ADDR. This means the analysis can be performed without modifying the original code base at all.

Tip

The tis_address attribute

This section uses tis_address to attach each variable to a single, concrete address. However, the attribute has other capabilities, including the ability to assign an address from a range or from a named memory region, as well as specifying alignments.

int a __attribute_(("[0x4000-0x4007]"));
int b __attribute_(("BCODE"));
int c __attribute_(("[0x4000-0x4010,0%4]"));

The attribute is also not limited to the application described here. The user can also use it to constrain any declared variable or function to a memory location or range of memory locations. The complete description of the tis_address can be found in a dedicated section of this guide, below.

Example Consider again the example from the head of the section where absolute addresses represent ports or registers of a peripheral device: GPIO_MODE (0x40020000-0x40020003), GPIO_DATA_A (0x40020014-0x40020017), GPIO_DATA_B (0x40020018-0x4002001b), and GPIO_DATA_C (0x4002001c-0x4002001f) and can be safely accessed.

#include <tis_builtin.h>

#define GPIO_MODE   0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i < 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_B);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1), 
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

In order to provide the analyzer with the information that accesses to these addresses are safe, the user creates a new source file physical_register_defs.c and defines a variable representing each of the absolute addresses used in the program: gpio_mode for GPIO_MODE, gpio_data_a for GPIO_DATA_A, etc. Since GPIO_MODE is used as a numerical value and stretches from 0x40020000 0x40020003, it is declared as unsigned int. The remaining variables are used as 4-element byte arrays, so they are all declared as 4-element arrays with elements of type unsigned char. Finally, each of the variables is pinned to the start address they are associated with by way of tis_address (the end address is inferred from their types).

#include <tis_builtin.h>

extern unsigned int  gpio_mode      __attribute__((tis_address(0x40020000)));
extern unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
extern unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
extern unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));

The analyzer can now conduct the analysis of the original program, using the definitions in the auxiliary file to inform its decision about validity of accesses to the memory locations by absolute addresses. In effect, the analysis finishes and produces the expected output, and does so without the need for any modifications to the original code base. The example is executed with big-ints-hex to display addresses using hexadecimal representation (see introspecting addresses).

$ tis-analyzer physical_register_abs.c physical_register_defs.c -val -slevel 10 -big-ints-hex 0x40000000
[value] Called tis_show_each({{ "MODE" }}, {0x40020000}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {0x40020018})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {0x4002001C})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
Advantages over externally-configured valid ranges

We recommend this approach rather than using the absolute-valid-range option, because the additional information about the expected sizes, value ranges, and type qualifiers (like const) for the values found in memory at those addresses guards the user against the pitfalls of the valid-address-range option:

Specifying discrete variables instead of a single area of memory allows the objects to be discontinuous (vs. Pitfall: multiple logical objects). It also instructs the analyzer how to distinguish between logical objects, allowing the analyzer to catch boundary violations (vs. Pitfall: boundaries of logical objects).

The type system can also be used to declare variables as volatile allowing the analyzer to account for the capability of peripheral devices to update their values (vs. Pitfall: external modifications to memory). Specifically, declaring variables to represent specific objects in memory allows the analyzer to model volatility with variable-granularity and even to model the behavior of volatile variables in detail using the Volatile plugin of the analyzer.

Example Consider again the example from Pitfall: boundaries of logical objects where GPIO_DATA_A and GPIO_DATA_B index out of their respective bounds:

#include <tis_builtin.h>

#define GPIO_MODE   0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i <= 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;  
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_A);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1), 
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

However, here, instead of declaring valid memory via the absolute-valid-range option, there is a file containing the definitions of variables representing the logical objects at GPIO_DATA_A and GPIO_DATA_B, etc.:

#include <tis_builtin.h>

extern unsigned int  gpio_mode      __attribute__((tis_address(0x40020000)));
extern unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
extern unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
extern unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));

Then, when analyzing the example with both files, the analyzer emits an alarm when the index if out of bounds of GPIO_DATA_A. This is because the variable declaration specifies for the analyzer what those bounds are in precise terms.

$ tis-analyzer physical_register_oob.c physical_register_defs.c -val -slevel 10 -big-ints-hex 0x40000000
tests/tis-user-guide/physical_register_oob.c:15:[kernel] warning: out of bounds read. assert \valid_read((unsigned char *)0x40020014+i);
Interpreting absolute addresses as variables declared in place

While providing definitions of variables describing the logical objects pointed to by absolute addresses does not require modifying the analyzed source code, it is sometimes convenient to place the definition of the variables alongside an original definition of the absolute address, such as a macro. In such cases, the definitions of the variables describing the underlying logical objects may be inserted directly into the source code of the program.

For example, given an absolute address representing an 8-element array of bytes might be declared as the ADDR macro:

#define ADDR_SZ 8
#define ADDR 0x4000

Using the suggested the technique, the code is modified to provide a variant that interprets this memory area as the variable byte_array:

#define ADDR_SZ 8
#define ADDR 0x4000
#ifdef __TRUSTINSOFT_ANALYZER__
  unsigned char byte_array[ADDR_SZ];
#endif

To keep these replacement declarations from interfering with the original code, it is recommended to define them conditionally, guarded by the __TRUSTINSOFT_ANALYZER__ macro. The analyzer defines the macro while parsing code in preparation for analysis. If the macro is undefined, the program includes only the original code that uses raw absolute addresses. If the macro is defined, the program includes the declarations of equivalent variables.

Tip

Use the tis-modifications tool to check that all analysis-related code modifications are locked behind the __TRUSTINSOFT_ANALYZER__ macro.

When the code of a program is subsequently analyzed, the address 0x4000 corresponds to the address of byte_array, allowing the analyzer to determine that the access is valid, as well as determining whether the data located there is used in accordance with the prescribed type.

Example The following example modifies the example from the head of the section so that the registers of a peripheral device are backed by variables for the purposes of analysis. GPIO_MODE covers gpio_mode, GPIO_DATA_A covers gpio_data_a, etc. In order to align the variables with the absolute addresses, they are fixed at specific positions via the tis_address attribute: gpio_mode is pinned to 0x40020000, gpio_data_a is pinned to 0x40020014, gpio_data_b to 0x40020018, and gpio_data_c to 0x4002001c (with their extents being derived from their types).

#include <tis_builtin.h>
#include <stdint.h>

#define GPIO_MODE   0x40020000
#define GPIO_DATA_A 0x40020014
#define GPIO_DATA_B 0x40020018
#define GPIO_DATA_C 0x4002001c

#ifdef __TRUSTINSOFT_ANALYZER__
  unsigned int  gpio_mode      __attribute__((tis_address(0x40020000)));
  unsigned char gpio_data_a[4] __attribute__((tis_address(0x40020014)));
  unsigned char gpio_data_b[4] __attribute__((tis_address(0x40020018)));
  unsigned char gpio_data_c[4] __attribute__((tis_address(0x4002001c)));
#endif

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i < 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;  
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_A);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1), 
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

Upon analysis, the analyzer produces the expected result, without emitting alarms. The analysis is set to run with slevel of 10 to conveniently simplify the values displayed by tis_show_each and with big-ints-hex set to display all addresses in the program using hexadecimal representation (see introspecting addresses).

$ tis-analyzer physical_register_var_addr.c -val -slevel 10 -big-ints-hex 0x40000000
[value] Called tis_show_each({{ "MODE" }}, {1073872896}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {1073872916})
[value] Called tis_show_each({{ "*DATA_A" }}, {0}, {0}, {0}, {0})
[value] Called tis_show_each({{ "DATA_B" }}, {1073872920})
[value] Called tis_show_each({{ "*DATA_B" }}, {1}, {1}, {1}, {1})
[value] Called tis_show_each({{ "DATA_C" }}, {1073872924})
[value] Called tis_show_each({{ "*DATA_C" }}, {0}, {0}, {0}, {0})
Replacing absolute addresses with unconstrained variables

While the previous technique associated variables with specific absolute addresses, it may sometimes be advantageous to remove the dependency on absolute addresses whatsoever. If an absolute address is defined via some macro, the user probably expects that the absolute address is always accessed through that macro. This technique causes the analyzer to detect stray attempts at accesses to the absolute address in the code. Outside of that, this technique carries the same advantages as the technique in the previous section, but requires the user to modify the source code of the analyzed program.

Consider an absolute address used via a macro. The address represents an 8-element array of bytes declared as the ADDR macro, with its size defined by ADDR_SZ:

#define ADDR_SZ 8
#define ADDR 0x4000

The values at ADDR are always meant to be accessed via the macro. Whereas an access that uses the value of the absolute address directly is potentially a mistake:

char byte0 = *((char *) ADDR + 0);
char byte1 = *((char *) ADDR + 1);
char byte2 = *((char *) 0x4002);
char byte3 = *((char *) ADDR + 3);

As with the technique above, the user can replace the absolute address with operations that dereference pointers to an unconstrained variables representing those absolute addresses. The code is modified to provide a variant that interprets this memory area as the variable byte_array. However, here it is not constrained to a particular address. In addition, the modification also encompasses ADDR itself, which is defined as a pointer to the first element of byte_array for the duration of the analysis.

#define ADDR_SZ 8
#ifdef __TRUSTINSOFT_ANALYZER__
  unsigned char byte_array[ADDR_SZ];
  #define ADDR byte_array
#else
  #define ADDR 0x4000
#endif

The modified code is compatible with the original version, in that there are no changes in how the code appears when compiled and executed, and when the code is being analyzed. Both ADDR and ADDR_SZ remain accessible and used for interacting with the memory in either case.

Warning

It may be the case that the macro used for dereferencing an absolute address is also used for other purposes, such as to code generation through token concatenation.

Since the __TRUSTINSOFT__ANALYZER__ variant of the code replaces the literal used in the macro with a variable name, the modification has the potential to impact the execution of the program, so it should be used with care.

This technique of replacing absolute addresses with variable references works best if the absolute address is behind a macro or similar. However, it is also possible to apply it without the macro by replacing individual uses of absolute addresses with variables directly. The procedure of doing so typically time consuming, but not error prone, since the analyzer will raise an alarm informing of an invalid memory access whenever a stray absolute address is accessed in the code (excluding dead code).

Example Consider the following code. It reprises the example from the head of the section where absolute addresses represent ports or registers of a peripheral device: GPIO_MODE (0x40020000-0x40020003), GPIO_DATA_A (0x40020014-0x40020017), GPIO_DATA_B (0x40020018-0x4002001b), and GPIO_DATA_C (0x4002001c-0x4002001f) and can be safely accessed. However, here, the information that these addresses are valid is conveyed to the analyzer in situ. Each of these constants is defined as a pointer to an associated external variable that represents the programmer’s interpretation of how the data should be accessed. The variable definitions are guarded by a macro, meaning that these variables are only present during analysis by TrustInSoft analyzer.

#include <tis_builtin.h>

#ifdef __TRUSTINSOFT_ANALYZER__
  extern unsigned int  gpio_mode;
  extern unsigned char gpio_data_a[4];
  extern unsigned char gpio_data_b[4];
  extern unsigned char gpio_data_c[4];

  #define GPIO_MODE   &gpio_mode
  #define GPIO_DATA_A ((unsigned char *) gpio_data_a)
  #define GPIO_DATA_B ((unsigned char *) gpio_data_b)
  #define GPIO_DATA_C ((unsigned char *) gpio_data_c)
#else
  #define GPIO_MODE   0x40020000
  #define GPIO_DATA_A 0x40020014
  #define GPIO_DATA_B 0x40020018
  #define GPIO_DATA_C 0x4002001c
#endif

#define VALUE(reg) *((unsigned int *)  reg)
#define BYTE(reg, index) (((unsigned char *) reg)[index])

void main(void) {
  VALUE(GPIO_MODE) = 0x19;

  for (int i = 0; i < 4; i++) {
    if (BYTE(GPIO_DATA_A, i) == 0) {
      BYTE(GPIO_DATA_B, i) = 1;
    } else {
      BYTE(GPIO_DATA_B, i) = 0;
    }
  }

  tis_show_each("MODE",    GPIO_MODE, VALUE(GPIO_MODE));
  tis_show_each("DATA_A",  GPIO_DATA_A);
  tis_show_each("*DATA_A", BYTE(GPIO_DATA_A, 0), BYTE(GPIO_DATA_A, 1),
                           BYTE(GPIO_DATA_A, 2), BYTE(GPIO_DATA_A, 3));
  tis_show_each("DATA_B",  GPIO_DATA_B);
  tis_show_each("*DATA_B", BYTE(GPIO_DATA_B, 0), BYTE(GPIO_DATA_B, 1),
                           BYTE(GPIO_DATA_B, 2), BYTE(GPIO_DATA_B, 3));
  tis_show_each("DATA_C",  GPIO_DATA_C);
  tis_show_each("*DATA_C", BYTE(GPIO_DATA_C, 0), BYTE(GPIO_DATA_C, 1),
                           BYTE(GPIO_DATA_C, 2), BYTE(GPIO_DATA_C, 3));
}

This program can be analyzed without producing errors (and without the need to configure valid memory through absolute-valid-range). Running the analysis with slevel set to 10 produces the expected results shown below. Note that the addresses of GPIO_MODE, GPIO_DATA_A, GPIO_DATA_B, and GPIO_DATA_C are no longer displayed as absolute values, but they are presented in reference to the variables that define them. Internally the analyzer assumes that these variables can be located at any possible memory address. Note also that since the variables are only declared (and not defined), the analyzer does not specify their initial contents.

$ tis-analyzer physical_register_var.c -val -slevel 10 -big-ints-hex 0x40000000
[value] Called tis_show_each({{ "MODE" }}, {{ &gpio_mode }}, {25})
[value] Called tis_show_each({{ "DATA_A" }}, {{ &gpio_data_a }})
[value] Called tis_show_each({{ "*DATA_A" }}, [0..255], [0..255], [0..255], [0..255])
[value] Called tis_show_each({{ "DATA_B" }}, {{ &gpio_data_b }})
[value] Called tis_show_each({{ "*DATA_B" }}, {0; 1}, {0; 1}, {0; 1}, {0; 1})
[value] Called tis_show_each({{ "DATA_C" }}, {{ &gpio_data_c }})
[value] Called tis_show_each({{ "*DATA_C" }}, [0..255], [0..255], [0..255], [0..255])
Constraints on physical addresses

Typically, the behavior of a program should be independent of the addresses of variables defined within it. Nevertheless, some programs do require that specific variables be located at specific addresses in memory. For instance, to represent specific hardware registers. Alternatively, the program may not require that variables are located at a specific address, but may make assumptions about the general area of memory a variable would be found it or the alignment of its address.

The analyzer does not make assumptions about the specific addresses of variables by default. Instead, it treats each variable as if it were placed at some valid, but unknown location in memory. However, since some programs require that a set of variables have their addresses concretized or limited to specific ranges, the user can instruct the analyzer to put such additional constraints on the variables’ addresses.

When variables are constrained, the analyzer is capable of performing operations on the values of their addresses in line with these constraints, allowing it to produce more precise results for bit-wise operations, integer arithmetic, and other operations. For instance, a variable with an alignment constraint will have an address that is divisible according to that alignment.

The user can place constraints on variable addresses using the tis_address builtin attribute or the absolute-address configuration option. The user can use these to attach address constraints onto individual variables. These constraints can take the form of singleton addresses or ranges of potential addresses, with or without additional alignment requirements. In addition, the analyzer provides a separate address-alignment option that sets an alignment for all objects in memory. All of these features are described in detail below.

Example The following code illustrates one facet of the problem. The program uses the technique of pointer tagging to smuggle data in “unused” bits of pointer values. Here, the code relies on an assumption that all addresses are 4-byte aligned to attach a 2-bit tag to pointers. The function tag_ptr checks whether a pointer is 4-byte aligned by checking if its last two bits are empty. If they are, the program writes a PTR_TAG to those bits. Otherwise, it returns 0x0 to indicate an error. The function untag_ptr undoes tag_ptr: it checks whether a pointer has a tag, and if it does, it strips it (by use of a mask). The program calls tag_ptr and untag_ptr in main on the pointer into a byte array called memory at an offset of 32 and inspects the results using tis_show_each.

#include <stdint.h>
#include <tis_builtin.h>

unsigned char memory[256];

#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)

uintptr_t tag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != 0) return 0x0;
  return ptr | PTR_TAG;
}

uintptr_t untag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != PTR_TAG) return 0x0;
  return ptr & VAL_MASK;
}

void main(void) {
  uintptr_t tagged_ptr = tag_ptr(&memory[32]);
  uintptr_t untagged_ptr = untag_ptr(tagged_ptr);

  tis_show_each("tagged",   tagged_ptr);
  tis_show_each("untagged", untagged_ptr);
}

Since the analyzer assumes the variable’s addresses to be any valid address, it cannot categorically determine whether the address passed into tag_ptr would pass the alignment check or not. Since it is expected that C/C++ programs operate independently of the values of addresses assigned to their variables by the linker, this causes the analyzer to raise a warning, informing that a condition depends on memory layout.

$ tis-analyzer pointer_tag.c -quiet -val -slevel 10 -print -print-filter tag_ptr
uintptr_t tag_ptr(uintptr_t ptr)
{
  uintptr_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
    __retres = (uintptr_t)0x0;
    goto return_label;
  }
  __retres = ptr | (unsigned int)2;
  return_label: return __retres;
}

However, if the addresses of variables within the program are constrained in such a way that the condition always succeeds or it always fails, the analyzer will accept it and proceed with the analysis.

The tis_address attribute

The user can constrain a variable to a specific address or one out of a range of possible addresses by annotating it with the tis_address attribute. Given a variable var of type T, the tis_address attribute is specified within the __attribute__ directive:

T var __attribute__((tis_address(…)));

The attribute can be placed on any variable (or function) definition, both local and global. A variable can have at most one tis_address specification.

The parameter of tis_address specifies where the variable is pinned. It is specified as a literal describing: a single address, a range of addresses, or a reference to a named memory region.

Warning

The tis_address attribute does not work within C++ code. See Absolute addresses and C++ for workarounds and alternatives.

Singleton addresses

A singleton address is specified as a string literal containing a single address. It means that a variable will be considered pinned at that exact address. The address can be expressed as a positive (non-zero) integer provided in hexadecimal, octal, binary, and decimal representations:

tis_address("0x4000")
tis_address("0X4000")
tis_address("0o40000")
tis_address("0O40000")
tis_address("0b100000000000000")
tis_address("0B100000000000000")
tis_address("16391")

Warning

Octal number notation

The tis_address attribute uses a different notation than C when expressing octal numbers. Addresses passed to absolute-valid-range prefixed with only a leading zero are interpreted as decimal.

If the address 0x0 (in any representation) is specified as an address of a variable, the analyzer stops the analysis with an error.

For convenience a single address can also be specified via an integer literal:

tis_address(0x4000)
tis_address(0X4000)
tis_address(040000)
tis_address(0b100000000000000)
tis_address(0B100000000000000)
tis_address(16391)

This is especially convenient for specifying a group of addresses that are relative to each other.

tis_address(0x4000 + 0)
tis_address(0x4000 + 1)
tis_address(0x4000 + 2)
tis_address(0x4000 + 3)

Example Consider again the example at the top of the section, where the tag_ptr and untag_ptr functions check the alignment of a pointer and write or strip a tag from it (respectively). Specifically, the functions are called on the pointer into an array called memory.

#include <stdint.h>
#include <tis_builtin.h>

unsigned char memory[256];

#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)

uintptr_t tag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != 0) return 0x0;
  return ptr | PTR_TAG;
}

uintptr_t untag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != PTR_TAG) return 0x0;
  return ptr & VAL_MASK;
}

void main(void) {
  uintptr_t tagged_ptr = tag_ptr(&memory[32]);
  uintptr_t untagged_ptr = untag_ptr(tagged_ptr);

  tis_show_each("tagged",   tagged_ptr);
  tis_show_each("untagged", untagged_ptr);
}

Ordinarily, the analyzer cannot decide whether the condition in tag_ptr succeeds or not because it depends on the value of the memory address of memory. The analyzer considers situations where a condition depends on the value of an address suspicious, if that condition could go either way. This is the case here, because the address is not constrained, so it could potentially be any valid address.

$ tis-analyzer pointer_tag.c -val -slevel 10 -quiet -print -print-filter tag_ptr
uintptr_t tag_ptr(uintptr_t ptr)
{
  uintptr_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
    __retres = (uintptr_t)0x0;
    goto return_label;
  }
  __retres = ptr | (unsigned int)2;
  return_label: return __retres;
}

However, with the use of the tis_address attribute, the program can pin the start of memory to a specific address, such as 0x20:

unsigned char memory[256] __attribute__((tis_address("0x20")));

Then, the analyzer uses that specific address value to determine what the outcomes of both the condition in tag_ptr and the condition in untag_ptr can potentially be. Since 0x20 + 32 = 0x40 is 4-byte aligned (it ends with binary 00), both conditions pass, so the analyzer proceeds accordingly. The analysis then prints out the value of the tagged pointer to memory[32] as 0x42 (0x20 + 32 + 2) and the subsequently untagged pointer as 0x40 (0x20 + 32). The analysis is run with big-ints-hex set to 0x1f to display the values of all addresses using hexadecimal representation (see introspecting addresses).

$ tis-analyzer pointer_tag_addr.c -val -slevel 10 -big-ints-hex 0x1f
[value] Called tis_show_each({{ "tagged" }}, {0x42})
[value] Called tis_show_each({{ "untagged" }}, {0x40})
Address ranges

An address range specifies that a given variable is to be located at a single address from the range between the first and last address (inclusive) during the execution. The analyzer never disambiguates the range for the purposes of the analysis.

A range is specified by boundaries, and it is denoted by square brackets with two singleton addresses separated by a two-dot ellipsis. This specification is expressed by a string literal and passed to tis_address as a single argument:

tis_address("[FIRST..LAST]")

The following example describes ranges containing the addresses 4000, 4001, 4002, 4003, 4004, 4005, 4006, and 4007 using all available representations:

tis_address("[0x4000..0x4007]")
tis_address("[0X4000..0X4007]")
tis_address("[0o40000..0o40007]")
tis_address("[0O40000..0O40007]")
tis_address("[0b100000000000000..0b100000000000111]")
tis_address("[0B100000000000000..0B100000000000111]")
tis_address("[16391..16398]")

Warning

Octal number notation

The tis_address attribute uses a different notation than C literals. The notations differ when expressing octal numbers. Notation specific to tis_address signify octal number by the prefixing them with 0o or 0O, as opposed to C notation where they are signified by just a leading zero. Addresses passed to tis_address prefixed with only a leading zero are interpreted as decimal.

The first address in a range cannot be larger than the last. Ranges where the first and last addresses are the same are interpreted as singleton addresses. Address ranges cannot include the address 0x0. If 0x0 is included in any of the constraints, the analyzer stops execution with an error.

An address range can also optionally include a specification of alignment. The alignment is given after a comma as congruence information (remainder and modulus):

tis_address("[FIRST..LAST],REM%MOD")

See Value analysis data representation (Integer values) for more information on congruence.

When specifying a range with an alignment, the FIRST and LAST addresses must match the alignment.

The following range contains all the addresses with a alignment to a 4-byte boundary, so including the addresses: 0x4000, 0x4004, 0x4008, 0x400c:

tis_address("[0x4000..0x400c],0%4")

As another example, the following range contains the addresses 0x4001, 0x4005, 0x4009, 0x400d:

tis_address("[0x4001..0x400d],1%4")

Note that the boundaries fit the specified alignment too.

The tis_address attribute allows setting alignment for individual variables. To specify alignment globally, use the address-alignment option described in a separate section below.

Example Consider the pointer tagging example from the previous subsection again, but instead of assigning a single specific address, the example allows memory to start at any address from a range of 0x20 to 0x40:

unsigned char memory[256] __attribute__((tis_address("[0x20..0x40]")));

(Since the example uses address ranges, it does not attempt to print the exact addresses of variables, see Introspecting addresses for more information.)

However, analyzing the range yields a warning again, because the address range contains addresses that are 4-byte aligned as well as ones that are not, and so the constraint does not guarantee the condition is always resolved the same way between executions.

$ tis-analyzer pointer_tag_range.c -val -slevel 10 -quiet -print -print-filter tag_ptr
uintptr_t tag_ptr(uintptr_t ptr)
{
  uintptr_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
    __retres = (uintptr_t)0x0;
    goto return_label;
  }
  __retres = ptr | (unsigned int)2;
  return_label: return __retres;
}

Therefore, the following modification constrains the pool of available addresses further to allow only those within the bounds between 0x20 and 0x40 that are 4-byte aligned:

unsigned char memory[256] __attribute__((tis_address("[0x20..0x40],0%4")));

This new constraint allows the analyzer to determine that the conditions in tag_ptr and untag_ptr are always going to evaluate to truth. In effect, the pointer will be tagged to the value &memory + {34} and subsequently untagged to &memory + {32}, as expected. Since &memory is a range of possibilities rather than a discrete address, the values of tagged_ptr and untagged_ptr are represented symbolically.

$ tis-analyzer pointer_tag_range_align.c -val -slevel 10
[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})

Example Consider the following practical example too. By default, the address size of a variable depends on the architecture chosen by the user (x86_64 has 64-bit pointers, x86_32 has 32-bit pointers, etc.) However, programmers sometimes take advantage of specific assumptions about the values of addresses to optimize operations on pointers. So, the following code assumes variables are allocated within the first 4GiB of available memory, and uses a 32-bit variable to store the low part of a (potentially) 64-bit pointer.

The program makes a pointer to variable var, and uses the function as_uint32 to chop it in half and return a 32-bit unsigned integer containing only the low bits into. The function also checks whether the pointer actually fits within 32-bits, and returns 0x0 if this is not the case. Once a pointer is reduced to a 32-bit integer, the program casts the integer back into a pointer and uses it to write 42 to var.

#include <stdint.h>
#include <assert.h>
#include <tis_builtin.h>

#include <stdint.h>

uint32_t as_uint32(uintptr_t ptr) {
  if(ptr >> 32UL == 0) {
    return ptr & 0xffffffff;
  } else {
    return 0x0;
  }
}

int main (){
  unsigned char var;

  uint32_t small_ptr = as_uint32(&var);
  tis_show_each("small_ptr", small_ptr);

  unsigned char *ptr = (unsigned char *) small_ptr;
  tis_show_each("ptr", ptr);
    
  *ptr = 42;
  tis_show_each("var", var);
}

When the program is analyzed (within a 64-bit architecture), the analyzer emits an alert, reporting that the condition in as_uint32 depends on memory layout. Since pointers are 64-bit and addresses are unconstrained, the analyzer cannot decide which branch the program would take during execution.

$ tis-analyzer 32bit_pointer.c -val -slevel 10 -64
uint32_t as_uint32(uintptr_t ptr)
{
  uint32_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr >> 32UL == (uintptr_t)0) {
    __retres = (uint32_t)(ptr & (unsigned long)0xffffffff);
    goto return_label;
  }
  else {
    __retres = (uint32_t)0x0;
    goto return_label;
  }
  return_label: return __retres;
}

In order to inform the analyzer about the assumption that the address falls within the first 4GiB of memory, the example is amended to specify that var’s address falls between 1 and 0xffffffff via the tis_address attribute.

  tis_show_each("small_ptr", small_ptr);

Then, execution proceeds as expected, showing that the pointer can be safely chopped and that casting the resulting integer back to a pointer produces a valid pointer to var:

$ tis-analyzer 32bit_pointer.c -val -slevel 10 -64 -quiet -print -print-filter as_uint32
[value] Called tis_show_each({{ "small_ptr" }}, {{ &var }})
[value] Called tis_show_each({{ "ptr" }}, {{ &var }})
[value] Called tis_show_each({{ "var" }}, {42})
Named memory regions

An address range can also be specified by a reference to an externally defined named memory region. This subsection describes how to define memory regions and then shows how to use them with tis_address below.

Defining memory regions

Named memory regions are defined via the memory-region option of the analyzer. The user can invoke this option via the command-line option -memory-region or via the equivalent JSON option.

Using either method, the user defines a list of memory regions. Each region definition consists of a label and a range of addresses the region contains. The label is a string akin to a variable name; it must start with a letter followed by any number of letters, numbers or underscores.

A range of addresses in a memory region is described either as:

  • a singleton address,
  • a start address and the size of the region in bytes (FIRST[LENGTH]), or
  • as a start address and an end address ([FIRST..LAST]).

The addresses and lengths describing a range are expressed using hexadecimal, octal, binary, or decimal integers, just like singleton addresses.

Warning

Memory regions do not allow alignment specification.

Memory regions cannot include the address 0x0. If 0x0 is included in any of the constraints, the analyzer stops execution with an error.

When defining memory regions via the -memory-region command-line option, memory regions form a comma-separated list, with each memory region’s label and address range separated delimited by a colon.

For example, the following command-line option defines two address ranges named R1 and R2, where:

  • R1 contains the four addresses between 0x4000 and 0x4003 (inclusive),
  • R2 contains twelve addresses starting at 0x4004 (up to and including 0x400f).
$ tis-analyzer -memory-region 'R1:[0x4000..0x4003],R2:0x4004[12]'

Tip

Avoiding shell expansion

When defining memory regions through a command-line argument, quote the definition to prevent the shell from expanding any of the symbols within. (Use single quotes '…' in most shells.)

When defining memory regions via the memory-region JSON configuration option, the option accepts a map from the regions’ labels to their address ranges. Both the labels and the definitions of address ranges are strings themselves.

For example, the following configuration defines the same two address ranges as above, named R1 and R2:

{
  "memory-region": {
    "R1": "[0x4000..0x4003]",
    "R2": "0x4004[12]"
  }
}
Plugging memory regions into tis_address

The user can refer to a named memory region by its label when defining a range of addresses with the tis_address attribute. The attribute’s address range will then be defined in terms of the memory region represented by the label, which can be further modified with additional alignment information.

For example, if the analyzer’s configuration declares the regions R1 and R2 from the previous section, they can be used in attribute declarations simply as follows:

tis_address("R1")
tis_address("R2")

While memory regions do not specify their own alignments, the user can specify an alignment within tis_address. It is appended in the same way as it is to a memory range.

For instance, the following tis_address declarations would use all addresses within R1 but only addresses aligned to a 4-byte boundary for R2.

tis_address("R1")
tis_address("R2,0%4")

When applying an alignment to a region, the first and last addresses within the region must match the specified alignment.

Tip

Setting global alignment

Named regions always have precise boundaries, so they are not well suited for expressing global alignment constraints.

To specify alignment without also specifying the bounds of the range use the address-alignment option described in a section below.

Example Consider again the pointer tagging example from above. Here, the program again example allows memory to start at any address from a range of 0x20 to 0x40, but this is not expressed in the code directly. Instead, the code refers to a named region called MEM, which will be defined through the configuration of the analysis.

unsigned char memory[256] __attribute__((tis_address("MEM")));

The code is then analyzed with memory-region defining a named region starting at 0x20 and spanning 65 addresses (so ending at 0x40). Since, the region is not constrained to a particular alignment, the effective address range contains addresses that both fail and pass the conditions in tag_ptr and untag_ptr, so this results in the warning about conditions dependent on memory layouts again:

$ tis-analyzer pointer_tag_region.c -val -slevel 10 -memory-region 'MEM:0x20[65]' -print -print-filter tag_ptr
uintptr_t tag_ptr(uintptr_t ptr)
{
  uintptr_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
    __retres = (uintptr_t)0x0;
    goto return_label;
  }
  __retres = ptr | (unsigned int)2;
  return_label: return __retres;
}

When the code is modified to constrain the region to a 4-byte alignment, the code executes without warnings and produces the expected values of &memory + {34} for the tagged pointer and &memory + {32} for the untagged one:

unsigned char memory[256] __attribute__((tis_address("MEM,0%4")));
$ tis-analyzer pointer_tag_region_align.c -val -slevel 10  -memory-region 'MEM:0x20[65]'
[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
Pitfall: Invalid constraint sets

The analyzer assumes that constraints placed on the addresses of variables represent some real constraint set expressed in a linker script or elsewhere in the toolchain used for compiling a given code base. The analyzer also requires that the analyzed code base compiles correctly under these constraints. It is up to the user to ensure that these prerequisites are met.

C and C++ variables are not allowed to overlap in memory. Since that is the case, a C or C++ compilation toolchain will refuse to compile a code base with a set of constraints that would require for memory objects to overlap. Therefore, the analyzer trusts that the specified set of address constraints can be satisfied by some valid memory layout. Specifically, a valid layout must ensure that:

Since an invalid constraint set would be encoded within a toolchain that would then fail to compile the code base, the user should be aware of any problems ahead of time. Then, the analyzer does not check the validity of the constraint set. This means that in the general case, when provided an invalid constraint set, the analyzer produces results rather than stopping with an error. Since these results do not reflect of any possible real-world execution, they are useless.

Warning

The analyzer does not check the validity of the constraint set placed on variable addresses in the general case. If the constraints make it impossible to produce a valid memory layout, the analyzer may produce results that do not reflect any real-world execution.

While the analyzer does not detect invalid constraint sets in general, it provides courtesy errors in specific circumstances. The possible behaviors of the analyzer in the presence of invalid constraint sets are enumerated in the following table:

\(\exists\) valid layout constraint set \(\Rightarrow\) analysis result
yes singleton address overlaps address range correct result
yes address range overlaps address range correct result
no singleton address overlaps singleton address error
no singleton address overlaps address range result does not reflect any execution
no singleton address overlaps absolute valid range error
no address range overlaps another address range result does not reflect any execution
no address range overlaps absolute valid range result does not reflect any execution

Example The following program declares two variables, x and y, each a 4-byte array of bytes. Here, x and y are both constrained to a single address in memory, 0x4004 via the tis_address attribute.

#include <tis_builtin.h>

unsigned char x[4] __attribute__((tis_address(0x4004)));
unsigned char y[4] __attribute__((tis_address(0x4004)));

void main() {
  // ...
} 

Analyzing this program yields an error, informing that the constraint placed on variable y causes the constraint set to be invalid. It is immaterial that neither variable is accessed.

$ tis-analyzer nonunique_address.c -val
[kernel] user error: invalid address specification for variable 'y'
                     (cannot register variable y in the memory range [0x4004 .. 0x4007] because memory range [0x4004 .. 0x4007] already holds variable x. Memory zone cannot overlap.).
[kernel] TrustInSoft Kernel aborted: invalid user input.

Example The following program extends the one above. Here variable y is constrained to a whole range, rather than a single address. Nevertheless, the range encompasses address 0x4004, to which x is specifically constrained. The constraints are invalid, because there is no layout where x and y would be placed in memory without overlapping, and variables cannot overlap.

#include <tis_builtin.h>

unsigned char x[4] __attribute__((tis_address("0x4004")));
unsigned char y[4] __attribute__((tis_address("[0x4004..0x4005]")));

void main() {
  x[0] = 255;
  x[1] = 255;
  x[2] = 255;
  x[3] = 255;

  tis_show_each("*x", x[0], x[1], x[2], x[3]);
  tis_show_each("*y", y[0], y[1], y[2], y[3]);
} 

If these constraints were reflected in a linker script for this program, it would not compile, so it should not be analyzed at all. But if it were analyzed anyway, the analyzer does not detect that the constraint set is invalid, and the analysis completes without errors. The results this produces do not reflect any possible execution of the program.

$ tis-analyzer overlapping_range_bad.c -val
[value] Called tis_show_each({{ "*x" }}, {255}, {255}, {255}, {255})
[value] Called tis_show_each({{ "*y" }}, {0}, {0}, {0}, {0})
The absolute-address option

The absolute-address option allows the user to constrain the addresses of global variables (external linkage symbols) in the same way as tis_address, but to do so without modifying the code of the analyzed program. Instead, the constraints can be specified via a command-line option or JSON configuration.

The absolute-address command-line option constraints a list of variables to a addresses or address ranges. The list of variables and their constraints is provided as a comma-separated list, with each element containing a name of a global variable and an associated constraint in the form of a singleton address, a range of addresses, or a reference to a memory region, as described above.

For instance, the following command-line option:

  • pins variable x (in some program) to the address 0x4000,
  • pins variable y to an address out of the an address range containing 0x4001, 0x4002, 0x4003, and
  • pins variable z to an address out of the an address range containing all 4-byte aligned addresses between 0x4004 and 0x4010.
$ tis-analyzer -absolute-address 'x:0x4000,y:[0x4001..0x4003],z:[0x4004..0x4010]\,0%4'

Warning

Note the backslash!

Constraints are specified on the command-line as a comma-separated list which means comma appearing in the alignment definition added onto address ranges and references to named regions must be escaped with a backslash.

For convince, the absolute-address command-line argument can be used multiple times in a single invocation of tis-analyzer. In that case, the analyzer uses a union of the constraints defined by absolute-address options. E.g.:

$ tis-analyzer -absolute-address 'x:0x4000' \
               -absolute-address 'y:[0x4001..0x4003]' \
               -absolute-address 'z:[0x4004..0x4010]\,0%4' \
               …

Just like in the case of the tis_address attribute, the absolute-address option can also use named memory regions to define addresses and ranges. The following command-line option pins variables r1 and r2 to addresses within the region R, which is specified as the twelve consecutive addresses starting from 0x4004. Variable r2 is further constrained to only those addresses that are aligned to a 4-byte boundary.

$ tis-analyzer -memory-region 'R:0x4004[12]' -absolute-address 'r1:R,r2:R\,0%4'

Tip

Avoiding shell expansion

When defining address constraints through a command-line argument, it is recommended to wrap the definition in appropriate quotes to prevent the shell from expanding any of the symbols that may have special meaning.

The absolute-address JSON configuration option works analogously to the command-line option. The option accepts a map, whose keys are variable names, and the values describe constraints that should be applied to the addresses of those variables. The constraints are strings described in terms of of singleton addresses, ranges of addresses, or references to memory regions, also as above.

As an example, the following snippet constrains variables x, y and z in the same way as the command-line example above, but using JSON configuration:

{
  "absolute-address": {
    "x": "0x4000",
    "y": "[0x4001..0x4003]",
    "z": "[0x4004..0x4010],0%4"
  }
}

As a further example, the following constrains variables r1 and r2 again, using references to the named memory region R:

{
  "absolute-address": {
    "r1": "R",
    "r2": "R,0%4"
  },
  "memory-region": {
    "R": "0x4004[12]"
  }
}

A variable should only by constrained once. If a variable has multiple constraints defined via absolute-address, the analyzer issues a warning, but continues the analysis, using the rightmost constraint.

The absolute-address option can also be used in conjunction with the tis_address attribute to assign constraints to two disjoint sets of variables. If a variable is constrained both through tis_address and through absolute-address, the analyzer raises an invalid user input error and stops.

Warning

Invalid constraint set

The analyzer does not check the validity of the constraint set placed on variable addresses in the general case. If the constraints make it impossible to produce a valid memory layout, the analyzer may produce results that do not reflect any real-world execution.

See Pitfall: Invalid constraint sets.

Example Consider the example from the beginning of the section on constraining physical addresses. This code uses pointer tagging to embed two bits of information in pointers based on the assumption that all addresses are 4-byte aligned.

#include <stdint.h>
#include <tis_builtin.h>

unsigned char memory[256];

#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)

uintptr_t tag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != 0) return 0x0;
  return ptr | PTR_TAG;
}

uintptr_t untag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != PTR_TAG) return 0x0;
  return ptr & VAL_MASK;
}

void main(void) {
  uintptr_t tagged_ptr = tag_ptr(&memory[32]);
  uintptr_t untagged_ptr = untag_ptr(tagged_ptr);

  tis_show_each("tagged",   tagged_ptr);
  tis_show_each("untagged", untagged_ptr);
}

To configure the analyzer so that all the (relevant) addresses are 4-byte aligned, the absolute-address option can put constraints on the value of the address of memory. Specifically, the following command-line configuration declares that memory is going to be located at address 0x20. The analysis is run with big-ints-hex set to format all addresses as hexadecimal, so the resulting execution proceeds to prints out the value of the tagged pointer to memory[32] as 0x40 (0x20 + 32 + 0b10) and the subsequently untagged pointer as 0x42 (0x20 + 32).

$ tis-analyzer pointer_tag.c -val -slevel 10 -big-ints-hex 0x1f -absolute-address 'memory:0x20'
[value] Called tis_show_each({{ "tagged" }}, {0x42})
[value] Called tis_show_each({{ "untagged" }}, {0x40})

Similarly, memory can be constrained to an overapproximated set of possible 4-byte aligned addresses. Note the use of a backslash-escaped comma \, to delimit the congruence information from the specification of the boundaries. It is used here, because variable constraints are also comma-separated.

$ tis-analyzer pointer_tag.c -val -slevel 10 -absolute-address 'memory:[0x20..0x40]\,0%4'
[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})

Finally, memory can be constrained to a specific memory region defined as MEM, with an additional congruence constraint that aligns the addresses to the appropriate boundary:

$ tis-analyzer pointer_tag.c -val -slevel 10 -absolute-address 'memory:MEM\,0%4' -memory-region 'MEM:0x20[65]'
[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})
Absolute addresses and C++

In order to specify an address or range of addresses for specific variables, the user can use the absolute-address configuration option described in above. The user can also use the address-alignment option described below to provide a blanket alignment specification to all variables. The user can also redefine the problem in terms of absolute addresses and the absolute-valid-range option.

However, the tis_address attribute is not supported in C++ source files. When the attribute is used in C++ code, the analyzer issues a warning and proceeds to analyze the code in question as if the attribute were absent. In order to specifically use tis_address within the code of the analyzed program, the user can modify the program in question so that the attribute is attached to a corresponding variable declared in a separate C file.

Warning

TrustInSoft Analyzer ignores the tis_address attribute in C++ source files.

Example Consider the following excerpt from a real-world C++ application. Here, MMIO is a class that acts as a wrapper which provides the basic functionality of a peripheral hardware register around an absolute memory location. MMIO contains the static member pointer that defines the location in memory where the register begins along with a type Contents which determines how many bytes the register spans. Both of the offset in memory and size of the register are provided via template arguments, and pointer is initialized statically by casting the provided address onto a pointer to Contents. The class also provides a method called byte which returns a reference to a byte within pointer. An access to this reference means that data is read from the peripheral, with each access causing a new read.

The example then, declares two 4-byte ports: port_a and port_b located at 0x8000 and 0x8008 respectively. The main function simply displays out the the addresses of the first byte of each port via the tis_show_each built-in, taking care to coerce the addresses into numerical values by performing a bit-wise operation on them (see Introspecting addresses).

#include <cstdint>
#include <functional>
#include <tis_builtin.h>

template <std::uintptr_t address, std::size_t size>
struct MMIO {
  using Contents = volatile std::uint8_t[size];
  static Contents * const pointer;

  template <std::size_t offset>
  static const volatile uint8_t& byte() {
    return *(reinterpret_cast<const volatile uint8_t*>(&pointer[offset]));
  }
};

template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
  reinterpret_cast<typename MMIO<address, size>::Contents*>(address);

using PortA = MMIO<0x8000, 4>;
using PortB = MMIO<0x8008, 4>;

PortA port_a;
PortB port_b;

int main () {
  tis_show_each("port_a", &port_a.byte<0>(), tis_force_ival_representation((uintptr_t) &port_a.byte<0>()));
  tis_show_each("port_b", &port_b.byte<0>(), tis_force_ival_representation((uintptr_t) &port_b.byte<0>()));
}

Analyzing this example (with tis-analyzer++) yields an UB, since the program attempts to access invalid memory at an invalid address:

$ tis-analyzer++ mmio.cpp -val -big-ints-hex 0x7fff
tests/tis-user-guide/mmio.cpp:12:[kernel] warning: pointer arithmetic:
                  assert \inside_object_or_null((void *)MMIO<32768, 4>::pointer);

The analysis can be made to proceed using any of the tools described in the chapter, including using the tis_address attribute, the absolute-address option, and the absolute-valid-range option. This example assumes that the user decided to use the tis_address attribute to specify the addresses for both port_a and port_b first. The following two examples then show how to convert the solution to the other two approaches.

Following the procedure outlined in Section Interpreting absolute addresses as external variables, the code of the example is first modified to provide a variable equivalent to the user’s interpretation of the contents of memory at the address specified by pointer (for every template specialization of the class MMIO). This involves creating a templated volatile variable, here called port_contents, for each specialization of MMIO and having the associated pointer point to that variable instead of being defined by an arbitrary address.

#ifdef __TRUSTINSOFT_ANALYZER__
  template<std::uintptr_t address, std::size_t size>
  volatile uint8_t port_contents[size];

  template <std::uintptr_t address, std::size_t size>
  typename MMIO<address, size>::Contents * MMIO<address, size>::pointer =
    &port_contents<address, size>;
#else
  template <std::uintptr_t address, std::size_t size>
  typename MMIO<address, size>::Contents& MMIO<address, size>::pointer =
    reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#endif

Then, port_contents is assigned the specific address defined through the template argument address and associated with the variable via tis_address. The modifications specific to TrustInSoft Analyzer are locked away behind preprocessor conditionals, to prevent them from impacting actual execution.

  volatile uint8_t port_contents[size] __attribute__((tis_address(address)));

The analysis informs that an unknown attribute was encountered (among other warnings). It then executes without emitting an alarm, because the addresses accessed are not longer invalid, but refer to specific variables. However, since tis_address was not recognized, the specific addresses remain unconstrained, causing tis_show_each to display the underlying values of &port_contents<32768, 4> and &port_contents<32776, 4> (equivalent to &port_contents<0x8000, 4> and &port_contents<0x8008, 4>) as any possible value that can be represented by the pointer [1..0xFFFFFFFB], instead of the expected addresses 0x8000 and 0x8008.

$ tis-analyzer++ mmio_addr.cpp -val -big-ints-hex 0x7fff
tests/tis-user-guide/mmio_addr.cpp:18:[cxx] warning: variable templates are a C++14 extension
tests/tis-user-guide/mmio_addr.cpp:18:[cxx] warning: unknown attribute 'tis_address' ignored
tests/tis-user-guide/mmio_addr.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8000, 4> ignored
tests/tis-user-guide/mmio_addr.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8008, 4> ignored
[value] Called tis_show_each({{ "port_a" }},
                             {{ &port_contents<0x8000, 4> }},
                             [1..0xFFFFFFFB])
[value] Called tis_show_each({{ "port_b" }},
                             {{ &port_contents<0x8008, 4> }},
                             [1..0xFFFFFFFB])

In order to pin port_a and port_b to specific addresses, the tis_address attribute must be attached to variables declared in C and not C++.

In order to accomplish this, the code is further modified so that port_contents variables for each specialization are declared as extern "C". Since templated variables cannot be declared within extern "C" contexts, the code declares separate variables for specific specializations, called port_a_contents and port_b_contents.

#ifndef __TRUSTINSOFT_ANALYZER__
  template <std::uintptr_t address, std::size_t size>
  typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
    reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#else
  extern "C" uint8_t port_a_contents[4];
  extern "C" uint8_t port_b_contents[4];

  template<>
  typename MMIO<0x8000, 4>::Contents * const MMIO<0x8000, 4>::pointer =
    &port_a_contents;
  template<>
  typename MMIO<0x8008, 4>::Contents * const MMIO<0x8008, 4>::pointer =
    &port_b_contents;
#endif

Then, the example is extended with another file. This file consists of C code that redeclares port_a_contents and port_b_contents with the tis_address attribute:

#include <stdint.h>
#include <tis_builtin.h>

extern uint8_t port_a_contents[4] __attribute__((tis_address(0x8000)));
extern uint8_t port_b_contents[4] __attribute__((tis_address(0x8008)));

At this junction the analysis can be executed (with both files added to the analysis configuration). This time the variables are pinned to the appropriate addresses, as desired:

$ tis-analyzer++ mmio_c_addr.cpp mmio_ports.c -val -big-ints-hex 0x7fff
[value] Called tis_show_each({{ "port_a" }}, {{ &port_a_contents }}, {0x8000})
[value] Called tis_show_each({{ "port_b" }}, {{ &port_b_contents }}, {0x8008})

Example Instead of attaching the tis_address attribute to variables via a separate C file, variables can instead be pinned to specific addresses within tis-analyzer++ via the absolute-address configuration option. To do this, consider the example again with a templated variable port_contents introduced as an equivalent to absolute memory addresses, but without an tis_address attribute attached to it in any way:

#ifdef __TRUSTINSOFT_ANALYZER__
  template<std::uintptr_t address, std::size_t size>
  volatile uint8_t port_contents[size];

  template <std::uintptr_t address, std::size_t size>
  typename MMIO<address, size>::Contents * MMIO<address, size>::pointer =
    &port_contents<address, size>;
#else
  template <std::uintptr_t address, std::size_t size>
  typename MMIO<address, size>::Contents& MMIO<address, size>::pointer =
    reinterpret_cast<typename MMIO<address, size>::Contents*>(address);
#endif

This variable can be constrained to a specific address (or range) via the absolute-address. The difficulty lies in figuring out the mangled name of the variable and accounting for the template. This can be done by finding the mangled variable names in the list of all variables:

$ tis-analyzer++ mmio_var.cpp -info-csv-variables vars.csv; head -1 vars.csv; grep port_contents <vars.csv
Name(s), File, Line, Type, Function, Kind, Storage, Initialized, Volatile, Const, Temporary, Is libc
_Z13port_contentsILj32768ELj4EE, tests/tis-user-guide/mmio_var.cpp, 22, <array>, NA, global variable, defined, yes, yes, no, no, libc:no
_Z13port_contentsILj32776ELj4EE, tests/tis-user-guide/mmio_var.cpp, 22, <array>, NA, global variable, defined, yes, yes, no, no, libc:no

Alternatively, the mangled name can also be discovered via the tis-analyzer++ GUI by right-clicking the variable definition and selection “Copy mangled name” from the drop down menu:

A screenshot of the interactive code pane of the GUI showing a definition of the variable port_contents<32776, 4>. The definition is highlighted and a drop down menu is open below it showing the options: Occurrence, Copy, Copy mangled name, and Create a link at this location. The cursor is hovering over the option Copy mangled name. A screenshot of the interactive code pane of the GUI showing a definition of the variable port_contents<32776, 4>. Next to it, there is a dialog box labeled Clipboard showing the mangled name of the variable as _Z13port_contentsILj32776ELj4EE.

The mangled variable names can then be plugged directly into absolute-address with address constraints. The analysis should be run with the C++14 (or more recent) standard specified, since it contains templated variables that were introduced in that version of C++. Executing the analysis with this option leads to the values of the first bytes of port_a and port_b to be rendered according to expectations as 0x8000 and 0x8008, respectively.

$ tis-analyzer++ -val mmio_var.cpp -big-ints-hex 0x7fff -cxx-std=c++14 -absolute-address _Z13port_contentsILj32768ELj4EE:0x8000,_Z13port_contentsILj32776ELj4EE:0x8008
tests/tis-user-guide/mmio_var.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8000, 4> ignored
tests/tis-user-guide/mmio_var.cpp:18:[value] warning: global initialization of volatile variable port_contents<0x8008, 4> ignored
[value] Called tis_show_each({{ "port_a" }},
                             {{ &port_contents<0x8000, 4> }},
                             {0x8000})
[value] Called tis_show_each({{ "port_b" }},
                             {{ &port_contents<0x8008, 4> }},
                             {0x8008})

(The analyzer also emits a warning that a volatile variable is initialized. This is due to it being declared as global, since global variables are always initialized. The warning can be safely ignored.)

Example The original example from the start of the session can also be made to work without any code modification, by specifying the area of memory accessed via concrete absolute addresses as valid using the valid-absolute-range configuration option instead. Thus, consider again the original program:

#include <cstdint>
#include <functional>
#include <tis_builtin.h>

template <std::uintptr_t address, std::size_t size>
struct MMIO {
  using Contents = volatile std::uint8_t[size];
  static Contents * const pointer;

  template <std::size_t offset>
  static const volatile uint8_t& byte() {
    return *(reinterpret_cast<const volatile uint8_t*>(&pointer[offset]));
  }
};

template <std::uintptr_t address, std::size_t size>
typename MMIO<address, size>::Contents * const MMIO<address, size>::pointer =
  reinterpret_cast<typename MMIO<address, size>::Contents*>(address);

using PortA = MMIO<0x8000, 4>;
using PortB = MMIO<0x8008, 4>;

PortA port_a;
PortB port_b;

int main () {
  tis_show_each("port_a", &port_a.byte<0>(), tis_force_ival_representation((uintptr_t) &port_a.byte<0>()));
  tis_show_each("port_b", &port_b.byte<0>(), tis_force_ival_representation((uintptr_t) &port_b.byte<0>()));
}

Given that the program accesses addresses spanning from 0x8000 to 0x8008 + 4, the analysis can be conducted by setting that range as valid. This causes the analysis to produce the expected results.

$ tis-analyzer++ -val mmio.cpp -big-ints-hex 0x7fff -absolute-valid-range 0x8000-0x800b
[value] Called tis_show_each({{ "port_a" }}, {0x8000}, {0x8000})
[value] Called tis_show_each({{ "port_b" }}, {0x8008}, {0x8008})
Global alignment

While the tis_address attribute and the absolute-address option both allow setting alignment per variable, using either approach would be tedious if the analyzed code requires that all variables conform to a specific alignment. If that is the case, the user can configure the analyzer to assume a given alignment for all variables.

The user can specify the alignment for all variables either by setting the -address-alignment command-line option or its equivalent JSON configuration option. The command-line option accepts an integer specifying the alignment in bytes. For example, this sets the alignment of all variables to 4 bytes:

$ tis-analyzer -address-alignment 4

The JSON configuration file works analogously:

{
  "address-alignment": 4
}

Example Reprise the pointer tagging example program from the introduction. It relies on an assumption that all addresses are 4-byte aligned to smuggle 2 bits of information within pointers. Specifically, the function tag_ptr checks whether a pointer is 4-byte aligned by checking if their last two bits are empty. If it is, the program writes a tag to those bits. Otherwise, it returns 0x0 to indicate an error. The program calls tag_ptr in main on the pointer to a variable called memory and inspects the result using tis_show_each.

#include <stdint.h>
#include <tis_builtin.h>

unsigned char memory[256];

#define PTR_TAG 2
#define TAG_MASK ((uintptr_t) 3)
#define VAL_MASK (-TAG_MASK - 1)

uintptr_t tag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != 0) return 0x0;
  return ptr | PTR_TAG;
}

uintptr_t untag_ptr(uintptr_t ptr) {
  if (ptr & TAG_MASK != PTR_TAG) return 0x0;
  return ptr & VAL_MASK;
}

void main(void) {
  uintptr_t tagged_ptr = tag_ptr(&memory[32]);
  uintptr_t untagged_ptr = untag_ptr(tagged_ptr);

  tis_show_each("tagged",   tagged_ptr);
  tis_show_each("untagged", untagged_ptr);
}

By default, the analyzer does not assume an alignment beyond the one implied by the type of each variable, so it cannot categorically determine address whether the address passed into tag_ptr would pass the alignment check or not. Thus, it raises an alarm:

$ tis-analyzer pointer_tag.c -val -slevel 10 -print -print-filter tag_ptr
uintptr_t tag_ptr(uintptr_t ptr)
{
  uintptr_t __retres;
  /*@ assert
      Value: unclassified:
        \warning("Conditional branch depends on garbled mix value that depends on the memory layout .");
  */
  if (ptr & (unsigned int)((uintptr_t)3 != (uintptr_t)0)) {
    __retres = (uintptr_t)0x0;
    goto return_label;
  }
  __retres = ptr | (unsigned int)2;
  return_label: return __retres;
}

In order to proceed with the analysis, the user sets the address-alignment option to 4, in which case, the analyzer can calculate whether the last two bits of the pointer will be empty or not, and proceed. The analysis finishes and shows the contents of the tagged pointer.

$ tis-analyzer pointer_tag.c -val -slevel 10 -address-alignment 4
[value] Called tis_show_each({{ "tagged" }}, {{ &memory + {34} }})
[value] Called tis_show_each({{ "untagged" }}, {{ &memory + {32} }})

Since the tag added to the pointer was 2 the resulting tagged pointer is shown as the original address &memory + 32 becomes &memory + 34.

Dynamic loading

To analyze an application that uses dynamic loading features from <dlfcn.h> such as dlopen, dlsym, etc., some specific information has to be provided by the user.

For instance, let’s consider the following program to be analyzed:

File use-dlopen.c to analyze:
#include <stdio.h>
#include <dlfcn.h>

int main (void) {
  void * handle = dlopen("some_lib.so", RTLD_LAZY);
  if ( ! handle) {
    fprintf(stderr, "%s\n", dlerror());
    return 1;
  }

  void (*f_process) (void);

  f_process = (void (*)(void)) dlsym(handle, "process");

  char * error = dlerror();
  if (error != NULL) {
    fprintf(stderr, "%s\n", error);
    return 1;
  }

  f_process();

  dlclose(handle);
  return 0;
}

Since it is deterministic, the interpreter mode is used, but it would be similar for a larger analysis that uses the analyzer mode. So, it can be analyzed with the following command:

$ tis-analyzer --interpreter use-dlopen.c

The trace shows that, in order to be able to load the function with dlsym, a stub_dlsym function has to be provided:

Results when no stubs are provided:
[TIS LIBC STUBS]: stub_dlsym error: For a more accurate analysis, override this function "stub_dlsym" with your own function
dlsym error: unable to load process symbol

Such a stub may for instance look like:

File use-dlopen-stubs.c to provide stub_dlsym:
#include <string.h>
#include <dlfcn.h>

void process (void);

void *stub_dlsym(const char *filename, const char * fname) {
  void * pf = NULL;
  if (0 == strcmp (filename, "some_lib.so")) {
    if (0 == strcmp (fname, "process")) {
      pf = &process;
    }
  }
  return pf;
}

Now, the command becomes:

$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c

There is a warning about the process function:

Results when a stub_dlsym is provided, but not the library:
tests/val_examples/use-dlopen-stubs.c:8:[kernel] warning: Neither code nor specification for function process, generating default assigns from the prototype

Of course, this is because the source code of the loaded library, that is supposed to hold the process function, has not been provided to the analyzer.

Let’s add a use-dlopen-plugin.c dummy file holding a process function:

File use-dlopen-plugin.c to provide the process function:
#include <stdio.h>

void process (void) {
  printf ("Hello from the 'process' function.");
}

Warning

Limitation: since the source files of the loaded library are analyzed together with the main source files, the constructor functions of dynamic libraries are called during the main program startup whereas they should be called when dlopen is called.

Now, the command becomes:

$ tis-analyzer --interpreter use-dlopen.c use-dlopen-stubs.c use-dlopen-plugin.c

The application is now analyzed as expected:

Results the library source code is included in the analysis:
Hello from the 'process' function.

Warning

If some function names are used in both the application and the loaded library, some renaming may be needed.

Compiler Specific Keywords

Some compilers have extensions with non standard keywords. One easy way to remove these keywords from the source code is to define them as empty macros. For instance:

-cpp-extra-args="-Dinterrupt=''"

Beware that it is up to you to check if removing these keywords might change the verification relevance.

Caution

The Mthread plug-in is only available in the commercial version of TrustInSoft Analyzer.

Multi-Threading

The Mthread plug-in makes it possible to verify multi-thread programs.

Because it also uses the same value analysis, it provides the same alarm detection, but it takes into account all the possible concurrent behaviors of the program by analyzing all the possible interleavings between all threads. As before, this represent an over-approximation of the possible behaviors of the program.

Moreover, Mthread can provide an over-approximation of the memory zones that are accessed concurrently by more than one thread. For each zone and thread, Mthread also returns the program points at which the zone is accessed, whether the zone is read or is written, and the callstack that lead to the statement.

Using the plug-in requires to add a stubbed version of the used concurrency library to the analyzed source files. For some concurrency libraries, this file is provided with the tool (currently pthread, VxWorks, and Win32).

Please ask for more information if needed.

Caution

The Strict Aliasing plug-in is only available in the commercial version of TrustInSoft Analyzer.

Strict Aliasing

The Strict Aliasing plug-in detects the violation of the strict aliasing rule as defined in the C99 and the C11 standards.

The references taken from the C11 standard for the strict aliasing rule are:

  • section 6.5p6 defining what is an effective type
  • section 6.5p7 defining how an object can be accessed
  • note 87 defining a rule specific to allocated memory

The strict aliasing analysis is currently in beta.

The analysis

The strict aliasing analysis is available using the parameter -sa when starting an analysis with TrustInSoft Analyzer. Using this option automatically launches the value analysis.

If a violation of the strict aliasing rule is detected during an analysis, a warning is displayed. However, this warning does not stop the analysis.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int foo(int *p, float *q)
{
    *p = 42;
    *q = 1.337;
    return *p;
}

int main(void)
{
    int x;
    return foo(&x, (float *)&x);
}

Given the previous C file foo.c, the strict aliasing analysis can be launched using the following command:

$ tis-analyzer -sa foo.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
foo.c:4:[sa] warning: The pointer q has type float *. It violates strict aliasing rules by
              accessing a cell with effective type int.
              Callstack: foo :: t.c:11 <- main
[value] done for function main

A violation of the strict aliasing rule is detected by the analyzer. The analyzer provides details about the violation: the pointer has the float type and the cell has the int type. However, the types are incompatible.

Strict Aliasing Analysis Options

Several options exist to parametrize the strict aliasing analysis.

The option -sa-strict-enum
Default Value:not set by default
Opposite:-sa-no-strict-enum

By default, the strict aliasing analysis uses the integer representation of the enum type. This enables to use a pointer type to the integer representation and to use the pointer to access the enum cell. The -sa-strict-enum option limits the default behavior: it only enables an access to an enum cell by using a pointer to the same enum type. For example:

1
2
3
4
5
6
7
8
9
enum E { a, b };

int main(void)
{
    enum E e = a, *p = &e;
    *(int *)p = 42;
    *p = b;
    return e;
}

The access at line 6 is accepted by default by the strict aliasing analysis because the example uses a pointer to a correct integer representation, as shown by the following output:

$ tis-analyzer -sa enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main

When using the -sa-strict-enum option, the strict aliasing analysis detects a violation at line 6, because it does not accept the integer representation.

$ tis-analyzer -sa -sa-strict-enum enum.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
enum.c:6:[sa] warning: The pointer (int *)p has type int *. It violates strict aliasing rules by
              accessing a cell with effective type enum E.
              Callstack: main
[value] done for function main
The option -sa-strict-struct
Default Value:not set by default
Opposite:-sa-no-strict-struct

When taking the address of a structure member, the strict aliasing analysis keeps track of the structure and the member, in order to check future pointer use. By default, the analyzer enables to access a memory location with an effective type of structure member by a pointer having the same type as the member. For example:

1
2
3
4
5
6
7
8
9
struct s { int a; };

int main(void)
{
    struct s s = { 0 };
    int *p = &s.a;
    *p = 42;
    return s.a;
}

The access at line 7 is enabled by the analyzer because the pointer p has the same type as the member a of the structure s.

$ tis-analyzer -sa struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main

When using the -sa-strict-struct option, this access is signaled as non-conformant because the member must be accessed with the same effective type (i.e. accessed by a pointer to the whole structure only).

$ tis-analyzer -sa -sa-strict-struct struct.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
struct.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
                a cell with effective type (struct s).a[int].
                Callstack: main
[value] done for function main
The option -sa-strict-union
Default Value:set by default
Opposite:sa-no-strict-union

When taking the address of a union member, the strict aliasing analysis keeps information about the whole union, and not only the referenced member. The analyzer limits the access to a memory location that have a union type by a pointer to the same union type.

1
2
3
4
5
6
7
8
9
union u { int a; };

int main(void)
{
    union u u = { 0 };
    int *p = &u.a;
    *p = 42;
    return u.a;
}

The access at line 7 is not valid according to the analyzer because the pointer p has not the expected union u type, even though the union includes a member having the same type as pointed by p.

$ tis-analyzer -sa union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
union.c:7:[sa] warning: The pointer p has type int *. It violates strict aliasing rules by accessing
               a cell with effective type (union u)[int].
               Callstack: main
[value] done for function main

When using the opposite option -sa-no-strict-union, the access is enabled, because the union u includes a member of int type.

$ tis-analyzer -sa -sa-no-strict-union union.c
[...]
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] done for function main
Volatile variables

In C, variables can be declared with a volatile qualifier, which tells the compiler that the value of the variable may change without any action being taken by the code the compiler finds nearby. In particular, a volatile variable can be modified outside the program. This means that its value may change unpredictably, even if the program did not directly perform a write to that variable. Or even if it did.

The volatile keyword is commonly used for preserving values of variables across a longjmp and for memory-mapped I/O devices. The volatile keyword also has uses in concurrency on single-core systems, such as for variables that are updated by out-of-scope interrupt routines or for concurrently-modified variables (a de facto practice until the ISO/IEC 9899:1999 standard, where more robust mechanism were not yet introduced).

The unpredictable nature of volatile variables requires special consideration from the analyzer, which must over-approximate their values. This guide shows how to identify volatile variables in the analyzed program, how to perform sound analyses with volatile variables present, and how to tailor the semantics of accesses to specific volatile variables to fit specific use cases.

Finding volatile variables

Given that volatile variables exhibit behavior that is distinct from how other variables behave, the analyzer keeps track of them and informs about their presence.

Volatile variables in the GUI

If a variable is volatile, the user will also be informed about this via the GUI. Whenever a volatile variable is inspected, the symbol Volatile icon appears in the Flags column of the Values tab in the bottom panel. Hovering over the icon informs that the contents of such variables are over-approximated.

Values tab showing a volatile variable icon in the Flags column for variable port_out. The mouse cursor is pointing at the volatile flag, showing hover text that says "Volatile variable: the contents are over-approximated."
Listing all volatile variables

Information about all volatile variables appearing in the code can be retrieved via the info-variables or info-csv-variables, which print information about all variables in the program, either to screen, or to a file, and distinguish volatile variables.

Variable information can be extracted via the command-line options -info-variables or -info-csv-variables.

The same result can be obtained through a JSON configuration file, by setting the option info-variables to true, or by providing a path via the info-csv-variables option (see Configuration files for more).

{
  "info-variables": true,
  "info-csv-variables": "…"
}

Example For instance, the following function operates on both volatile and non-volatile variables:

#include <tis_builtin.h>

volatile unsigned char port_out;
const volatile unsigned char port_in;
int main(void) {
    unsigned char data[] = { 0, 1, 2, 3 };
    int cursor = 0;
    while (!port_in) {
      port_out = data[cursor++];
    }
    tis_show_each("cursor", cursor);
    tis_show_each("port_in port_out", port_in, port_out);
}

Information about these variables can be extracted by the command below. The Volatile column specifies whether the variable was declared as volatile or not. Here, port_in and port_out are listed as volatile.

$ tis-analyzer -info-variables volatile_example.c
Name(s), File, Line, Type, Function, Kind, Storage, Initialized, Volatile, Const, Temporary, Is libc
port_out, tests/tis-user-guide/volatile_example.c, 3, unsigned char, NA, global variable, defined, no, yes, no, no, libc:no
port_in, tests/tis-user-guide/volatile_example.c, 4, unsigned char, NA, global variable, defined, no, yes, yes, no, libc:no
cursor, tests/tis-user-guide/volatile_example.c, 7, int, main, local variable, defined, no, no, no, no, libc:no

(Ordinarily, the output will contain other variables, but was filtered for brevity.)

The Const column also notes whether a variable can be modified by the analyzed program which is relevant for distinguishing read-only volatile variables. Here, port_in is marked as constant.

Handling volatile variables

The analyzer has to handle volatile variables conservatively to retain the soundness of the analysis—its ability to prove properties of program being true at run-time. Since volatile variables can be externally changed to any value at any point in the execution, the analyzer must over-approximate them.

In addition, while over-approximating the behavior of volatile variables makes the analysis sound, the loss of precision might make it less useful, especially if the user has specific knowledge about how these volatile variables behave in practice. In response, the analyzer provides tools to specify a specification for the behavior of volatile variables, or even to ignore the volatility of variables completely. The user can also add volatility to non-volatile variables.

When volatile variables are treated as non-volatile or if they are assumed to follow specific semantics, the analysis becomes unsound in the general case. That is, the analyzer will follow assumptions provided by the user, and if these assumptions are in some way incorrect, there may occur an execution of the analyzed program that causes undefined behavior, and that behavior might not be found by the analyzer. Therefore, the user must exercise utmost caution when specifying volatile behaviors.

The behavior of volatile variables during analysis with various parameters, and their impact on soundness are summarized in the table below. The remainder of this guide goes into the detail of these analyses and these options.

Analysis Behavior Soundness
Value (analyzer profile) approximate value to full range sound
Value (interpreter profile) halt analysis on volatile read sound
Value with remove-volatile-locals ignore volatile modifier on specific variables unsound
Value with remove-volatile ignore volatile modifier on all variables unsound
Value with volatile-globals add volatile modifier to specific variables sound
WP approximate value to full range sound
WP with wp-volatile approximate value to full range sound
WP without wp-volatile ignore volatile modifier on all variables unsound
Any analysis with volatile plugin replace volatile accesses with function calls unsound
Volatile variables in value analysis

Since volatile variables can be modified externally to the analyzed program, value analysis handles volatile variables by making the conservative assumption that they always contain an unknown value, irrespective of what the analyzed program does. This over-approximation preserves the soundness of the analysis.

Example The following program exhibits a common use case for the use of the volatile keyword, where a variable represents a hardware register or a sensor, so its value cannot be modified by the program but changes due to external factors. This particular program declares a global variable called sensor which is qualified with const and volatile keywords, and is initially undefined. The program then loops until the value of sensor is set.

#include <tis_builtin.h>
#include <unistd.h>    

const volatile unsigned char sensor;
int main(void) {
    while (!sensor) {
        sleep(1);            
    }
    tis_show_each("sensor", sensor);
    return 0;
}

Running the example with value analysis shows sensor is assumed to be initialized and its value is approximated as the entire range of type unsigned char.

$ tis-analyzer -val volatile.c
[value] Called tis_show_each({{ "sensor" }}, [0..255])

Note that the value of the volatile variable remains approximated to the full range of its type, even if it is assigned during the execution of the program.

Example For instance, the following program declares a volatile local variable inside the function main whose value is set initially to 0, and subsequently set to 1.

#include <tis_builtin.h>

int main(void) {
    volatile int x = 0;
    tis_show_each("before", x);
    x = 1;
    tis_show_each("after", x);
    return x;
}

However, after each assignment, the analyzer continues to approximate the value of the variable to any integer value, assuming that the value of x could be changed externally.

$ tis-analyzer -val volatile_local.c
[value] Called tis_show_each({{ "before" }}, [-2147483648..2147483647])
[value] Called tis_show_each({{ "after" }}, [-2147483648..2147483647])
Volatile variables in the interpreter

When the analyzer is run with the interpreter profile, it follows a single execution path and avoids over-approximation. The abstract interpreter specifically requires that variables be associated with a single value at each point during the execution of the program. Therefore, the interpreter cannot just approximate volatile variables as having any value within their range.

On the other hand, since values of volatile variables may be modified externally, the interpreter cannot assume the values of such variables to be known precisely without the loss of soundness. Therefore, the interpreter halts when encountering an access to a volatile variable in its execution path.

Example The following example is similar to the one from the previous section. It also shows a popular use of the volatile keyword for a sensor or hardware register, where the value of a variable sensor is set externally but cannot be modified within the program. This program assigns sensor the initial variable of 255 for convenience.

#include <tis_builtin.h>
#include <unistd.h>

const volatile unsigned char sensor = 255;
int main(void) {
    while (!sensor) {
        sleep(1);            
    }
    tis_show_each("sensor", sensor);
    return 0;
}

When this program is interpreted, it produces two warnings and an error. The first warning informs that the initialization is ignored, since the value of volatile variables may change at any point. The second warning informs that the value of a volatile variable cannot be used to evaluate any computation in interpreter mode. The error informs that interpretation cannot proceed past the attempt to evaluate the volatile variable. The interpreter does not evaluate tis_show_each to show the value of sensor.

$ tis-analyzer -val --interpreter volatile_interpreter.c
tests/tis-user-guide/volatile_interpreter.c:4:[value] warning: global initialization of volatile variable sensor ignored
tests/tis-user-guide/volatile_interpreter.c:6:[value] warning: The following sub-expression cannot be evaluated
                 (due to volatile type, try option -remove-volatile):
                 sensor
                 
                 All sub-expressions with their values:
                 unsigned char  sensor ∈ [0..255]
                 
                 Stopping.
[value] user error: Degeneration occurred:
                    results are not correct for lines of code that can be reached from the degeneration point.
Ignoring volatile variables

It is possible to proceed with the interpretation of a program that reads from volatile variables by treating them as non-volatile. This is done by specifying the -remove-volatile or -remove-volatile-locals command-line flags. This can be especially useful when using value analysis with the interpreter profile to prevent it from halting on a read from a volatile variable (see above).

The remove-volatile option

Setting the remove-volatile option causes value analysis to be conducted as if all volatile variables where not volatile. Specifically, the analyzer assumes that variables cannot be modified outside the scope of the analyzed program, regardless of whether they are marked volatile. Since this assumption might not be borne out in practice, this mode of analysis is unsound and may not find all UBs.

Warning

Using remove-volatile on programs reading volatile variables is unsound in the general case.

This option can be set by using the -remove-volatile command-line flag:

$ tis-analyzer -val -remove-volatile …

The feature can also be turned on within a JSON analysis configuration file using the remove-volatile Boolean option (see Configuration files):

{
  "val": true,
  "val-profile": "interpreter",
  "remove-volatile": true
}

Example The following program is the same as the previous section. It contains an example use of a volatile variable that is only set externally and cannot be modified within the program. The program defines a variable called sensor with constant and volatile qualifiers and assigns the initial value of 255 to it.

#include <tis_builtin.h>
#include <unistd.h>

const volatile unsigned char sensor = 255;
int main(void) {
    while (!sensor) {
        sleep(1);            
    }
    tis_show_each("sensor", sensor);
    return 0;
}

The previous section shows that interpreting this example using value analysis yields an error when sensor is read. However, when run with the -remove-volatile flag, the interpreter ignores the volatile modifier on sensor and proceeds to analyze the program as if all modifications to sensor could be tracked by interpreting the program’s execution.

$ tis-analyzer -val --interpreter -remove-volatile volatile_interpreter.c
[value] Called tis_show_each({{ "sensor" }}, {255})
The remove-volatile-locals option

The remove-volatile-locals is a variant of the remove-volatile option that removes volatility only from volatile variables defined within specific functions. That is, this option causes value analysis to be conducted as if volatile variables within a specific set of functions where not volatile. Specifically, the analyzer assumes that the volatile variables declared inside the given functions cannot be modified outside the scope of the analyzed program. As with remove-volatile, since this assumption might not be born out in practice, the analysis is unsound in general.

Warning

Using remove-volatile-locals on programs reading volatile variables is unsound in the general case.

This feature can be used via the -remove-volatile-locals command-line option. Here, the option specifies that local variables declared within functions main and test should be treated as non-volatile:

$ tis-analyzer -val -remove-volatile-locals main,test …

The feature can also be turned on within a JSON analysis configuration file using the remove-volatile-locals option (see Configuration files). Here, the option specifies a list of functions by analogy to the command-line option above:

{
  "val": true,
  "val-profile": "interpreter",
  "remove-volatile-locals": ["main", "test"]
}

Example The following example mirrors that of the preceding section, except the variable sensor is declared locally within the main function rather than globally. The program again defines a variable called sensor with constant and volatile qualifiers and the initial value of 255.

#include <tis_builtin.h>
#include <unistd.h>

int main(void) {
    const volatile unsigned char sensor = 255;
    while (!sensor) {
        sleep(1);            
    }
    tis_show_each("sensor", sensor);
    return 0;
}

Again, interpreting this example using value analysis yields a warning informing that the volatile variable cannot be read and halts.

$ tis-analyzer -val --interpreter -remove-volatile-locals volatile_locals.c
tests/tis-user-guide/volatile_locals.c:6:[value] warning: The following sub-expression cannot be evaluated
                 (due to volatile type, try option -remove-volatile):
                 sensor
                 
                 All sub-expressions with their values:
                 unsigned char  sensor ∈ [0..255]
                 
                 Stopping.
[value] user error: Degeneration occurred:
                    results are not correct for lines of code that can be reached from the degeneration point.

However, when executed with the option remove-volatile-locals set to main, the interpreter treats all local variables within main as non-volatile, even if they have the volatile qualifier. Thus, the analysis proceed and returns the value of sensor as 255.

$ tis-analyzer -val --interpreter -remove-volatile-locals main volatile_locals.c
[value] Called tis_show_each({{ "sensor" }}, {255})

The remove-volatile-locals option only applies to variables declared within a given function, so if a global variable is used within such a function, the interpreter cannot handle it. For example:

$ tis-analyzer -val --interpreter -remove-volatile-locals main volatile_interpreter.c
tests/tis-user-guide/volatile_interpreter.c:4:[value] warning: global initialization of volatile variable sensor ignored
tests/tis-user-guide/volatile_interpreter.c:6:[value] warning: The following sub-expression cannot be evaluated
                 (due to volatile type, try option -remove-volatile):
                 sensor
                 
                 All sub-expressions with their values:
                 unsigned char  sensor ∈ [0..255]
                 
                 Stopping.
[value] user error: Degeneration occurred:
                    results are not correct for lines of code that can be reached from the degeneration point.

If a function uses a combination of globally-defined volatile variables and local ones, the user should use a combination of remove-volatile and remove-volatile-locals to achieve the desired effect.

Volatile variables in WP

By default, volatile variables behave the same in WP as they do in value analysis: the volatile value is assumed to be capable of being modified at any point in the execution of the analyzed program regardless of the code of the program. This preserves the soundness of the analysis.

Example Consider the following function and its ACSL contract. Here, decrement_counter checks whether the global variable counter has a value greater than 0, and decreases it by 1 if it does. If the function managed to successfully decrease the value of counter, it returns 0, otherwise it returns 1. The contract reflects this by specifying that counter and the result of the function both change depending on the value of counter, and by defining two separate behaviors: one for when counter is greater than 0 and for when it is 0.

unsigned char counter;

/*@ assigns counter \from counter;
    assigns \result \from counter;
    
    behavior can_decrement:
        assumes counter > 0;
        ensures counter == \old(counter) - 1;
        ensures \result == 0;

    behavior cannot_decrement:
        assumes counter <= 0;
        ensures counter == \old(counter);
        ensures \result == 1;
*/
int decrement_counter() {
  if (counter > 0) {
    counter--;
    return 0;
  }
  return 1;
}

Running WP on this example shows that the all properties can be successfully checked:

$ tis-analyzer -wp -wp-rte -no-tis-libc wp.c
[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_can_decrement_post_2 : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post_2 : Valid
[wp] Proved goals:    8 / 8
     Qed:             6 
     Alt-Ergo:        2  (24)

(Timing information was removed.)

Then, imagine that the presented program is meant to communicate with a peripheral device via the counter variable, both potentially reading and writing from it. This is represented by the following modification to the example:

volatile unsigned char counter;

When the example is modified so that counter is volatile, the WP analysis fails. This is because the analyzer cannot assume that the value of counter does not change spontaneously, and therefore cannot prove that the value of counter after executing the function would be either the same or less by exactly one than its value before the function call.

$ tis-analyzer -wp -wp-rte -no-tis-libc volatile_wp.c
[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Timeout
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post_2 : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_cannot_decrement_post : Timeout
[wp] [Alt-Ergo] Goal typed_decrement_counter_cannot_decrement_post_2 : Timeout
[wp] Proved goals:    3 / 8
     Qed:             3 
     Alt-Ergo:        0  (interrupted: 5)

(Timing information was removed.)

An interested user can investigate the details of the failing conditions by running the following:

Ignoring volatile variables in WP

The behavior of the WP analysis can be modified to ignore the volatile modifier on all variables. This is done by turning off the wp-volatile option (it is turned on by default). Since there is no guarantee that the user’s assumption about volatile variables remaining unchanged is true in the general case, the analysis becomes unsound.

Warning

Using wp-no-volatile on programs reading volatile variables makes the analysis unsound in the general case.

The option can be turned off via the command-line by setting the -wp-no-volatile flag.

$ tis-analyzer -wp -wp-rte -wp-no-volatile …

Alternatively, the option can be unset via a JSON configuration file with with the Boolean option wp-volatile:

{
  "wp": true,
  "wp-rte": true,
  "wp-volatile": false
}

Example Running the example above with the wp-volatile option turned off means the volatile keyword is ignored and all the properties are successfully confirmed. However, the analyzer also emits warnings whenever a property involves an access to a volatile variable, in effect informing that the analysis to be unsound.

$ tis-analyzer -wp -wp-rte -no-tis-libc -wp-no-volatile volatile_wp.c
[wp] Running WP plugin...
[wp] Loading driver '../../tis-analyzer/wp/share/wp.driver'
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe write-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe read-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:25:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:24:[wp] warning: unsafe read-access to volatile l-value
tests/tis-user-guide/volatile_wp.c:15:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:15:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:14:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:20:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:20:[wp] warning: unsafe volatile access to (term) l-value
tests/tis-user-guide/volatile_wp.c:19:[wp] warning: unsafe volatile access to (term) l-value
[wp] 8 goals scheduled
[wp] [Alt-Ergo] Goal typed_decrement_counter_assert_rte_signed_overflow : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part1 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part2 : Valid
[wp] [Qed] Goal typed_decrement_counter_assign_part3 : Valid
[wp] [Alt-Ergo] Goal typed_decrement_counter_can_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_can_decrement_post_2 : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post : Valid
[wp] [Qed] Goal typed_decrement_counter_cannot_decrement_post_2 : Valid
[wp] Proved goals:    8 / 8
     Qed:             6 
     Alt-Ergo:        2  (24)
Treating non-volatile variables as volatile

By default, the analyzer treats variables as volatile only if they are explicitly declared with the volatile qualifier in the source code. However, in some cases it may be beneficial to promote other variables to be volatile, even if they are not declared as such. This can be used to simulate concurrency or to conform to a de facto usage without modifying the source code.

Treating global variables as volatile

The volatile-globals option allows to indicate a set of global variables that value analysis has to consider volatile despite them not being declared as such in the program.

The feature can be turned on via the -volatile-globals command-line option, passing in a list of global variables. Here, the analyzer will treat the variables sensor and port as volatile, regardless of whether they are declared volatile in the source code:

$ tis-analyzer -val -volatile-globals sensor,port

Alternatively, the feature can be turned on within a JSON configuration using the volatile-globals option (see Configuration files). Here, the option specifies a list of variables by analogy to the command-line option above:

{
  "val": true,
  "volatile-globals": ["sensor", "port"]
}

Example Consider the following program containing two global variables: port_in and port_out, neither of which are declared with the volatile qualifier. The program writes 1 or 0 to port_out, depending on whether port_in is 0 or not.

#include <tis_builtin.h>

unsigned char port_in;
unsigned char port_out;

void main(void) {    
    if (port_in == 0) {
        port_out = 1;
    } else {
        port_out = 0;
    }

    tis_show_each("port_in port_out", port_in, port_out);
}

Upon analysis, since neither port_in nor port_out are volatile, the analyzer assumes they are both initialized to 0 and their value remains unchanged until the program modifies port_out. Therefore, the analysis shows the values of port_in and port_out as 0 and 1 at the end of function main.

$ tis-analyzer -val volatile_globals.c
[value] Called tis_show_each({{ "port_in port_out" }}, {0}, {1})

On the other hand, since the program can be deduced to be using those variables for communication, the analyzer can be instructed to treat them as volatile. In that case, the analyzer cannot predict the values of port_in and port_out even after port_out is modified by the program, because either variable can be modified externally at any point in the execution.

$ tis-analyzer -val -volatile-globals port_in,port_out volatile_globals.c
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
Treating memory ranges as volatile

The volatile-globals option also allows to indicate a range of absolute memory addresses that value analysis has to consider volatile. This is done by providing the option with NULL as an argument, and defining a range of valid memory addresses via the absolute-valid-range option. This allows modeling MMIO with such memory ranges. See Physical addresses for details.

The absolute volatile address range can be specified via the command-line using the -volatile-globals and -absolute-valid-range options. Here, the analyzer is given a valid address range starting at 0x1000 and ending at 0x2000 (inclusive), specified to be treated as volatile.

$ tis-analyzer -val -volatile-globals NULL -absolute-valid-range 0x1000-0x2000 …

The same analysis parameters can be specified via a JSON configuration using the volatile-globals and absolute-valid-range options (see Configuration files):

{
  "val": true,
  "volatile-globals": [ "NULL" ],
  "absolute-valid-range": "0x1000-0x2000"
}

The absolute valid range is declared volatile in its entirety or not at all. The analyzer does not currently support treating some parts of the absolute valid range as volatile and others as non-volatile

Tip

Representing absolute addresses as equivalent variables gives more flexibility in this regard. It also provides other advantages and is the recommended approach overall. See the guide on physical addresses for details on transforming a program from using absolute addresses to using variables pinned to specific address ranges.

Example Consider a program that communicates with hardware via an area of memory defined in terms of absolute addresses. The address space is a 1-byte area that starts at 0x1000, as defined by the constant PORT. Values are read and written to this memory area via the macro VALUE which interprets a byte at a specific address as an unsigned char. The program reads the value at PORT and replies with 0 or 1 depending on the received value.

#include <tis_builtin.h>
#define PORT 0x1000
#define VALUE(port) *((unsigned char *) port)

void main(void) {    
    tis_show_each("PORT", VALUE(PORT));
    if (VALUE(PORT) == 0) {
        VALUE(PORT) = 1;
    } else {
        VALUE(PORT) = 0;
    }    
    tis_show_each("PORT", VALUE(PORT));
}

Since the example refers to an absolute address, running the analyzer on this program yields an alarm informing that the read is out of bounds.

$ tis-analyzer -val volatile_range.c
tests/tis-user-guide/volatile_range.c:6:[kernel] warning: out of bounds read. assert \valid_read((unsigned char *)0x1000);

Hence, analyzing the example requires specifying that the absolute memory addresses used by the program are valid via the absolute-valid-range option. Here, the range of addresses starting at 0x1000 and ending at 0x1001 (inclusive) is specified as valid. The analysis then treats the range as valid and containing any values. The analysis then reports the value read from PORTS to be anything in the range of [0..255]. The value of PORTS after the program writes to it either 0 or 1.

$ tis-analyzer -val -absolute-valid-range 0x1000-0x1001 volatile_range.c
[value] Called tis_show_each({{ "PORT" }}, [0..255])
[value] Called tis_show_each({{ "PORT" }}, {0; 1})

However, if the program uses PORT to communicate with an external entity, as is the intention, the values within PORT could be externally modified at any point in the execution of the program. Soundness requires that this be reflected in the analysis. Thus, the memory range used for communication should be treated as volatile. This is accomplished by using the volatile-globals option. When NULL is passed as its argument, the option treats the entire valid address range as volatile. Then, the value of PORTS is also correctly reported as any value in range [0..255], even after it was written to within the program.

$ tis-analyzer -val -absolute-valid-range 0x1000-0x1001 -volatile-globals NULL volatile_range.c
[value] Called tis_show_each({{ "PORT" }}, [0..255])
[value] Called tis_show_each({{ "PORT" }}, [0..255])
Modeling MMIO with the volatile plugin

While adding volatile behavior to global variables or address ranges allows to indicate accesses to peripherals via MMIO, it does not model the behavior of the peripheral itself. Instead the analyzer assumes that a volatile variable can be changed to any value at any time. The volatile plugin allows the user to simulate the exact behavior of hardware by replacing accesses to volatile variables with function calls. The semantics of those function calls can then be defined via ACSL properties. This allows the analyzed software to be studied in conditions that resemble final deployment conditions.

The analyzer assumes that the functions replacing accesses to volatile variables correctly and completely describe the operation of peripherals. The analysis is sound only under those conditions, and unsound if they are not borne out.

Warning

Using the volatile plugin on programs reading volatile variables is unsound in the general case.

When using the volatile plugin, the user prepares a specification of the behavior of a volatile variable. The specification takes the form of function signatures that would replace reading and writing the variable with contracts specifying their behavior. Given a volatile variable v of some type T, the read and write functions must have the following signatures.

T rd_v(volatile T *ptr);
T wr_v(volatile T *ptr, T value);

(The names used in function signatures are arbitrary and different names can be provided by the user.)

These functions can be defined in C in full, in which case their bodies describe their behavior to the analyzer. Defining these functions in this way can be especially useful when the source code programming the peripheral is already available as C source code.

T rd_v(volatile T *ptr) {
    // …
}
T wr_v(volatile T *ptr, T value) {
    // …
}

Alternatively, if the C code defining the behavior of the peripheral is not available or is too complex, the behavior of read and write accesses can be described declaratively using ACSL properties. This is the recommended way of defining the behavior of volatile variables and it is described in detail in following sections.

/*@ requires ptr == &v;
  @ ensures \result==…;
  @ assigns \result \from …;
  @ …
  @*/
T rd_v(volatile T *ptr);

/*@ requires ptr == &v;
  @ ensures \result==…;
  @ assigns \result \from …;
  @ …
  @*/
T wr_v(volatile T *ptr, T value);

Once the the behavior specification is in place, the volatile plugin needs to be informed that a specific variable should be replaced by function calls. The user does this by adding an additional ACSL annotation to the source code. The annotation specifies the name of the volatile variable that will be replaced, the name of a read function (after the reads keyword), and the name of a write function (after the writes keyword).

//@ volatile v writes wr_v reads rd_v ;

While it is recommended to supply both a read and a write specification, the configuration can omit either. In those cases, the volatile variable is read from or written to directly, depending on which function is missing.

If this happens, the analyzer also generates a warning informing that either the read or write access function was not defined for a volatile value. This warning can be turned off via the -no-warning-on-lvalues-partially-volatile command-line flag.

The analyzer also warns if there are volatile variables for which no volatile annotation is given at all. If this is not an oversight, the warning can be turned off via the -no-warning-on-volatile-lvalues flag.

Tip

ACSL annotations can only use symbols that were already defined. If you are using a symbol in your annotation that is not defined or is defined in the source code after the annotation, the analyzer cannot proceed and you receive the following error:

[kernel] user error:: cannot find function '…' for volatile clause

Once replacement functions are specified and their behavior is defined, the plugin is ready for use. It is applied via the -volatile command-line flag.

$ tis-analyzer … -volatile

When the flag is set, the analyzer transforms the analyzed source code into a new project called Volatile, where volatile accesses are replaced with function calls according to the provided specification. The analyzer can then be configured to perform further analyses on the Volatile project via the -then-on sequencing option.

$ tis-analyzer … -volatile -then-on Volatile …

Tip

The modified source code of the Volatile project is not output by the analyzer. To view the modified source code (normalized), run an analysis on the Volatile project with -print:

$ tis-analyzer … -volatile -then-on Volatile -print

Or, to show only functions and variables of interest, with -print and -print-filter options:

$ tis-analyzer … -volatile -then-on Volatile -print -print-filter main,v,wr_v,rd_v

An equivalent JSON configuration is not currently supported (See Command line options).

Example Consider this program communicating with a peripheral device via a pair of volatile variables called port_in and port_out. The peripheral device acts as a buffer. The program writes bytes into port_out until the peripheral signals it to stop by writing 1 back to port_in, which it will do after it receives 4 bytes. Given that the actions of the peripheral device are external and obscured by the volatile variable, the example initially looks like this:

#include <tis_builtin.h>

volatile unsigned char port_out;
const volatile unsigned char port_in;
int main(void) {
    unsigned char data[] = { 0, 1, 2, 3 };
    int cursor = 0;
    while (!port_in) {
      port_out = data[cursor++];
    }
    tis_show_each("cursor", cursor);
    tis_show_each("port_in port_out", port_in, port_out);
}

Even though the programmer might have knowledge about how the peripheral device behaves, the analyzer still currently treats it as any possible value allowed by its type and emits an alarm that cursor will index the data array out of its bounds when it reaches 4:

$ tis-analyzer -val -slevel 100 volatile_example.c
tests/tis-user-guide/volatile_example.c:9:[kernel] warning: accessing out of bounds index {4}. assert tmp < 4;
                                                     (tmp from cursor++)
[value] Called tis_show_each({{ "cursor" }}, {0})
[value] Called tis_show_each({{ "cursor" }}, {1})
[value] Called tis_show_each({{ "cursor" }}, {2})
[value] Called tis_show_each({{ "cursor" }}, {3})
[value] Called tis_show_each({{ "cursor" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])
[value] Called tis_show_each({{ "port_in port_out" }}, [0..255], [0..255])

The following snippet extends the example above with a description of the behavior of the volatile variables in the form of C code.

The state of the port_out variable is simulated by two new variables: the byte last_written representing the last written value, and the integer cursor informing how many elements were written into the buffer. The behavior of port_out is described by the functions wr_port_out and rd_port_out. When a value is written to port_out, it becomes last_value, cursor is incremented. If there is no more room in the buffer the write is ignored. Reading from port_out returns the most recently successfully written value from last_value.

The behavior of port_in is simulated by the function rd_port_in . Reading from read_in causes returns 0 if there is room in the buffer and 1 if it is full. Writing to port_in is not supported, so no function is provided. In order not to produce a warning about this, the analyzer will run with the -no-warning-on-lvalues-partially-volatile flag set.

Both volatile variables are connected to their read and write functions by their respective volatile ACSL annotations.

#include <tis_builtin.h>

volatile unsigned char port_out;
const volatile unsigned char port_in;

#define BUFFER_LEN 4
unsigned char last_value;
int cursor = 0;

unsigned char rd_port_out(unsigned char volatile *ptr) {
  return last_value;
}

unsigned char wr_port_out(unsigned char volatile *ptr, unsigned char value) { 
  int buffer_full = cursor >= BUFFER_LEN;
  if (!buffer_full) {
    last_value = value;
    cursor++;
  }
  return last_value;
}

const unsigned char rd_port_in(const unsigned char volatile *ptr) {
  return cursor >= BUFFER_LEN;
}

//@ volatile port_out reads rd_port_out writes wr_port_out;
//@ volatile port_in reads rd_port_in;

int main(void) {
    unsigned char data[] = { 0, 1, 2, 3 };
    int i = 0;
    while (!port_in) {
      port_out = data[i++];
    }
    tis_show_each("i", i);
    tis_show_each("port_in port_out", port_in, port_out);
}

Applying the volatile module causes accesses to port_in and port_out to be replaced with calls to the relevant functions:

$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_c.c -then-on Volatile -print -print-filter main
int main(void)
{
  int __retres;
  unsigned char data[4];
  int i;
  data[0] = (unsigned char)0;
  data[1] = (unsigned char)1;
  data[2] = (unsigned char)2;
  data[3] = (unsigned char)3;
  i = 0;
  while (1) {
    {
      unsigned char __volatile_tmp;
      __volatile_tmp = rd_port_in(& port_in);
      if (! (! __volatile_tmp)) break;
    }
    {
      int tmp;
      {
        tmp = i;
        i ++;
        wr_port_out(& port_out, data[tmp]);
      }
    }
  }
  tis_show_each("i", i);
  {
    unsigned char __volatile_tmp_9;
    unsigned char __volatile_tmp_7;
    __volatile_tmp_7 = rd_port_in(& port_in);
    __volatile_tmp_9 = rd_port_out(& port_out);
    tis_show_each("port_in port_out", (int)__volatile_tmp_7,
                  (int)__volatile_tmp_9);
  }
  __retres = 0;
  __tis_globfini();
  return __retres;
}

Then, using the resulting Volatile projects allows the analyzer to be more precise about the state of the volatile variables at each point in the program execution. Here, it correctly predicts that at the end of the execution, port_in will contain the specific value of 1, and port_out the last value written to it.

$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_c.c -then-on Volatile -val -slevel 100
[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})
Defining volatile accesses with ACSL

While the example above defines the behavior by writing the read and write functions in C, the preferred alternative is to define that behavior declaratively by providing an ACSL contract to the function declarations. This prevents the need to model the behavior of peripherals in detail and is easier to handle for the analyzer.

When using ACSL at minimum, each function should establish a connotation between the pointer pass as argument and the volatile variable in the program:

/*@ requires ptr == &v; */

The functions can also provide the semantics of each function by defining their return values:

/*@ ensures \result==…; */

Functions returning values and functions with side effects should also specify dependencies of the modified data:

/*@ assigns \result \from … */
/*@ assigns … \from …       */

For details on ACSL, see the ACSL properties section of the documentation.

Example This example re-writes the previous example to use ACSL annotations to define function semantics for wr_port_out, rd_port_out, and rd_port_in. The principle of operation is the same, including using the same additional variables to simulate state. These variables are used within the ACSL annotations for each function to declare how calling the function impacts the state of the peripheral.

The ACSL contract for function rd_port_out specifies that it must be called on the volatile variable port_out and that it always returns the value of the variable last_value.

/*@ requires ptr == &port_out;
    assigns \result \from last_value;
    ensures \result == last_value;
*/
unsigned char rd_port_out(unsigned char volatile *ptr);

The contract for wr_port_out also specifies that it must be called on port_out and that, apart from returning a value, it has side effects on cursor and last_value. The function has two separate sets of behaviors depending on whether the buffer is already full or not. If there is still room, the function increments cursor and sets last_value from the argument value. If the buffer is already full, the function does not change cursor. In either case the function returns the contents of last_value;

/*@ requires ptr == &port_out;        

    assigns \result \from last_value;
    assigns cursor \from cursor;
    assigns last_value \from value, last_value, cursor;

    ensures \result == last_value;

    behavior nonfull: 
      assumes cursor < BUFFER_LEN;     
      ensures cursor == \old(cursor) + 1;
      ensures last_value == value;

    behavior full: 
      assumes cursor == BUFFER_LEN;      
      ensures cursor == \old(cursor);

    complete behaviors;
    disjoint behaviors;
*/
unsigned char wr_port_out(unsigned char volatile *ptr, unsigned char value);

Finally, the contract for rd_port_in specifies that this function must be called on port_in. This function returns 1 if the buffer is full, or 0 otherwise.

/*@ requires ptr == &port_in;
    assigns \result \from \nothing;
    ensures (\result == 1 && cursor >= BUFFER_LEN) 
         || (\result == 0 && cursor < BUFFER_LEN);  
*/
const unsigned char rd_port_in(const unsigned char volatile *ptr);

The remainder of the example remains unchanged. When it is run via the volatile plugin, it returns the expected results:

$ tis-analyzer -volatile -no-warning-on-lvalues-partially-volatile volatile_example_acsl.c -then-on Volatile -val -slevel 100
[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})
Auto-binding volatile variable accesses to functions

Instead of defining the behavior of volatile variables by declaring or defining read and write functions for each of them individually, it may be more convenient to blanket define such functions for an entire type of volatile variables. This is done via the binding-auto feature of the Volatile plugin. This feature causes the Volatile module of the analyzer to attempt to bind the accesses to all volatile variables to appropriately named and typed functions, based on the type of the variables in question.

Warning

The volatile plugin is an experimental feature of TrustInSoft Analyzer.

Specifically, given a volatile variable of some type T defined as, the binding-auto feature replaces reads and writes to that variable with calls to the following functions:

T c2fc2_Wr_T(T *ptr, T value);
T c2fc2_Rd_T(T *ptr);

(Note the capitalization of Wr and Rd .)

These functions have to be declared manually. The behavior of these functions can be defined by providing them with bodies written in plain C or it can be specified using ACSL.

Then, the feature is turned on within the Volatile plugin by setting the binding-auto command-line flag alongside the -volatile flag:

$ tis-analyzer … -volatile -binding-auto

When both of these flags are set, the Volatile module replaces volatile accesses with function calls, if it can find an appropriately named function for a specific volatile variable’s type. The replacement is put into effect only if the function’s signature matches the signature expected for volatile accesses. The replacement is also not effected if a different function is already specified for a given variable via an volatile ACSL annotation, in which case the functions specified in the annotation are prioritized.

The same result can be obtained through a JSON configuration file, by setting the option binding-auto to true (see Configuration files):

{
   "binding-auto": true
}

The functions used for volatile accesses start with a c2fc2_ prefix. This prefix can be changed via the -binding-prefix command-line option:

$ tis-analyzer -volatile -binding-auto -binding-prefix "auto_"

Alternatively, the prefix can also be set by providing a string to the binding-prefix option through a JSON configuration file (see Configuration files):

{
   "binding-auto": true,
   "binding-prefix": "auto_"
}

Given that the automatic binding feature works by associating variables and function implicitly and it relies on the right names and types of functions being defined, it is easy to make a mistake while using it. However, the plugin provides debug output that makes it clear which variables were bound to which functions. The output is turned on by asking for binding messages to be printed via the -volatile-msg-key option. The output of this option is only shown at debug levels 2 or greater, so it should always be used in conjunction with the -debug command-line option. The output is available regardless of whether binding-auto is turned on or not.

$ tis-analyzer -volatile -binding-auto -debug=2 -volatile-msg-key=binding

Debug messages from the binding process of the Volatile plugin are displayed with the tag [volatile:binding], e.g.:

[volatile:binding] Looking for a function relative to write access to volatile left-value: var
[volatile:binding] Looking for a default binding from the type name: T volatile
[volatile:binding] Looking for function c2fc2_Wr_T

Note: the output of volatile-msg-key=binding is available regardless of whether binding-auto is turned on or not.

Example Consider again the example from the previous section, but modified to use the binding-auto feature to specify functions to use for accessing the variable port_out. Since the type of that variable is unsigned char, the access functions defined previously as wr_port_out and rd_port_out are now renamed to auto_Wr_unsigned_char and auto_Rd_unsigned_char (the example uses the custom prefix auto_ rather than the default c2fc2_):

/*@ requires ptr == &port_out;
    assigns \result \from last_value;
    ensures \result == last_value;
*/
unsigned char auto_Rd_unsigned_char(unsigned char volatile *ptr);

/*@ requires ptr == &port_out;        

    assigns \result \from last_value;
    assigns cursor \from cursor;
    assigns last_value \from value, last_value, cursor;

    ensures \result == last_value;

    behavior nonfull: 
      assumes cursor < BUFFER_LEN;     
      ensures cursor == \old(cursor) + 1;
      ensures last_value == value;

    behavior full: 
      assumes cursor == BUFFER_LEN;      
      ensures cursor == \old(cursor);

    complete behaviors;
    disjoint behaviors;
*/
unsigned char auto_Wr_unsigned_char(unsigned char volatile *ptr, unsigned char value);

The example also removes the ACSL annotation explicitly binding these functions with port_out. Meanwhile the replacement function for port_in remains unchanged, as does its volatile ACSL annotation:

/*@ requires ptr == &port_in;
    assigns \result \from \nothing;
    ensures (\result == 1 && cursor >= BUFFER_LEN) 
         || (\result == 0 && cursor < BUFFER_LEN);  
*/
const unsigned char rd_port_in(const unsigned char volatile *ptr);

When analyzing the code, with the Volatile plugin, with the binding-auto flag, and with the binding-prefix option set to auto_, the accesses to port_out are replaced with auto_Wr_unsigned_char and auto_Rd_unsigned_char, so the analyzer produces the expected results:

$ tis-analyzer -volatile -binding-auto -binding-prefix=auto_ -no-warning-on-lvalues-partially-volatile volatile_example_auto.c -then-on Volatile -val -slevel 100
[value] Called tis_show_each({{ "i" }}, {4})
[value] Called tis_show_each({{ "port_in port_out" }}, {1}, {3})

Running the code with debug information shows the process of binding functions to volatile variables. Variable port_in simply uses the function defined for it, while port_out is bound to functions whose name was inferred from the type name of unsigned char:

$ tis-analyzer -volatile -debug 2 -volatile-msg-key=binding -binding-auto  -binding-prefix=auto_ -no-warning-on-lvalues-partially-volatile volatile_example_auto.c -then-on Volatile -val -slevel 100
[volatile] Running volatile plugin...
[volatile] Processing volatile clauses...
[volatile] Building volatile table...
[volatile] Building new project with volatile access transformed...
[volatile:binding] Normalizing port_in into port_in
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_in
[volatile:binding] Function found: rd_port_in
[volatile] tests/tis-user-guide/volatile_example_auto.c:50 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_in
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to write access to volatile left-value: port_out
[volatile] Building default binding table...
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Wr_unsigned_char
[volatile:binding] Verifying prototype of function auto_Wr_unsigned_char: unsigned char (
                   unsigned char volatile *, unsigned char)
[volatile:binding] Function found: auto_Wr_unsigned_char
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to write access to volatile left-value: port_out
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Wr_unsigned_char
[volatile:binding] Verifying prototype of function auto_Wr_unsigned_char: unsigned char (
                   unsigned char volatile *, unsigned char)
[volatile:binding] Function found: auto_Wr_unsigned_char
[volatile] tests/tis-user-guide/volatile_example_auto.c:51 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms write access to volatile left-value: port_out
[volatile:binding] Normalizing port_in into port_in
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_in
[volatile:binding] Function found: rd_port_in
[volatile] tests/tis-user-guide/volatile_example_auto.c:54 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_in
[volatile:binding] Normalizing port_out into port_out
[volatile:binding] Looking for a function relative to read access to volatile left-value: port_out
[volatile:binding] Looking for a default binding from the type name: unsigned char volatile
[volatile:binding] Looking for function auto_Rd_unsigned_char
[volatile:binding] Verifying prototype of function auto_Rd_unsigned_char: unsigned char (
                   unsigned char volatile *)
[volatile:binding] Function found: auto_Rd_unsigned_char
[volatile] tests/tis-user-guide/volatile_example_auto.c:54 (included from tests/tis-user-guide/volatile_example_auto.driver.c): main function: Transforms read access to volatile left-value: port_out
Functions With Variable Arguments
The C11 standard references

Our references from the C11 standard here are:

  • the section 7.16: Variable arguments <stdarg.h>,
  • and the appropriate parts, i.e. those concerning variadic macros, of section J.2 (Undefined behavior) of appendix J (Portability issues).
Limitations: what is not satisfying the standard?
The va_list type

The type va_list, declared in the <stdarg.h> header, should be, according to the C11 standard, a complete object type. There are some limitations on TrustInSoft Analyzer’s capabilities of handling this type and its correct use on the basic level.

Only in local variables

Objects of the va_list type are handled correctly only if they appear as local variables, formal arguments, etc. We do not support global va_list variables, arrays of va_list objects nor va_list as the type of fields of complex types (i.e. unions or structures). A fatal error will occur if va_list objects are used in such a way (however if they are just declared and then not actually used there is no error).

Assignments

It is not entirely clear (in the C11 standard) if performing assignments on variables of va_list type (i.e. ap1 = ap2;) is permitted or not. However, as both gcc and clang compilers refuse to compile such operations, we assume that they are not permitted and thus we do not handle them.

TrustInSoft Analyzer does not verify though if va_list assignments appear in the code: if they do, the program will not be rejected, even though the behavior is undefined.

Casting
Casts to the va_list type

Casting from other data types to the va_list type is implicitly forbidden and this rule is enforced by TrustInSoft Analyzer: an appropriate error will occur if such a cast is encountered and the program will be rejected.

Casts from the va_list type

Casting from the va_list type to other data types is also forbidden, though we do not enforce it: programs containing such cases are incorrect, but they will not be rejected by TrustInSoft Analyzer.

Passing va_list objects to other functions

The va_list objects can be passed as arguments to other functions. However, if the called function invokes the va_arg macro on this object, its value in the calling function becomes indeterminate (i.e. it cannot be used anymore for va_arg in the calling function). This rule does not apply if va_list is passed by a pointer. See subsection 7.16.1 (of the C11 standard).

We do not enforce this rule in TrustInSoft Analyzer. All va_list objects passed to other functions are treated as if they were passed by pointer. Each invocation of the va_arg macro on such a va_list object will simply return the subsequent argument, without considering where the previous va_arg invocations happened.

This approach is similar to what is implemented in gcc and clang.

Returning va_list objects from functions

It is not completely clear (in the C11 standard), but it seems that functions returning va_list objects are allowed. However, as both gcc and clang compilers refuse to compile such functions, we assume that using them is not permitted. TrustInSoft Analyzer rejects programs which declare functions returning va_list objects.

Undefined behavior cases not completely verified

Some undefined behavior cases enumerated in the C11 standard, see section J.2 (Undefined behavior) of appendix J (Portability issues), are not verified by TrustInSoft Analyzer.

Invoking va_arg on va_list objects passed to other functions
The macro va_arg is invoked using the parameter ap that was passed to a function that invoked the macro va_arg with the same parameter (7.16).

Status: Not verified at all (as stated above).

Suppressing definitions of variadic macros
A macro definition of va_start, va_arg, va_copy, or va_end is suppressed in order to access an actual function, or the program defines an external identifier with the name va_copy or va_end (7.16.1).

Status: Not verified at all.

The type parameter of the va_arg macro