Prepare the Analysis

This chapter explains how to specify which part of the source code of an application will be studied and in which context. Moreover, it also shows how the overall goal can be split into several separate analyses if needed. The main objective is to be able the run the value analysis, implemented by the Value plug-in, in order to obtain the alarms concerning the software weaknesses listed in the CWE-subset.

Define the Perimeter

The study perimeter could be the whole program, or only some functions of a library, or a single use case scenario. Explaining how to decide which part of the source code should be studied is very difficult, since it depends a lot on the particular application, the amount of time available, and mostly on how one looks at the problem… Adopt an incremental approach: begin with a small study, in order to understand how to apply the tools in the given situation, and then enlarge the perimeter later on.

Prepare the Entry Point

In order to run a value analysis, an entry point to the program has to be provided. The body of the entry point function defines the studied perimeter. It is usually the main function which establishes the context verified by the analysis, but other functions can be used to this end as well.

  • To analyze a library function, the entry point function should build values for the given library function’s input arguments, and then call the library function itself using the arguments prepared in this way (see Write an Entry Point).
  • To analyze a scenario, preparation of input values may be needed as well, then followed by calling some functions in a sequence. This sequence of function calls defines the scenario.
  • In some cases, the actual main function of an application can be used directly. However, if it takes options and arguments, it still has to be called from an entry point that builds values for them. The tis-mk-main utility can help in doing so (see tis-mk-main Manual). Be aware though, that if main is is a complex function that parses options and needs many string manipulations, it is probably a better idea to write a smaller entry point from scratch in order to define a more precise context of analysis.

It is important to mention here the difference between dynamic test execution and static value analysis. As the code is not executed in the latter, each of the built inputs provided to the analyzed function does not need to have a single value. It means that a function taking a single integer parameter x can, for instance, be analyzed for all the possible input values, or for all the values from a given set (e.g. 3 < x < 150). So when we mention “a value” here, we do not actually mean “a single concrete value”, but rather “a set of abstract values”.

Write an Entry Point

Basically, the entry point function has to call the functions to analyze, providing them with appropriate input values (i.e. function arguments) that correspond to the studied perimeter. Some builtin-in functions are available to build these input values:

  • for an integer interval:

    x = tis_interval(l, u);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from any value between l and u being returned. Several other functions are also provided for other types like for instance tis_double_interval(l, u) for floating-point values, and tis_unsigned_long_long_interval(l, u) for wide integers, which behave the same way for the types double and unsigned long long.

  • to initialize addr[0 .. len-1]:

    tis_make_unknown (addr, len);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from having any arbitrary len bytes in memory starting from addr.

    The tis_make_unknown function is also useful to initialize a simple variable:

    tis_make_unknown (&x, sizeof (x));
    

    This is equivalent to x = tis_interval(l,u); when l is the minimum value of the type and u is the maximum value of a type.

  • for a non-deterministic choice between two integers:

    x = tis_nondet (a, b);
    

    It guarantees that the analyzer will produce warnings for any bad behavior that could result from x value being a or b. These are only two cases, but these cases combine with the other possibilities resulting from the calls to the other builtin functions.

  • for a non-deterministic choice between two pointers:

    p = tis_nondet_ptr (&x, &y);
    

    This one is similar to the previous one, but for pointers.

Example: the main function below shows a valid entry point to test a compute function that takes a buffer, its length, and a pointer to store a result:

#include <stdio.h>
#include <tis_builtin.h>

int compute (char * buf, size_t len, char * result);

int main (void) {
  char buf[100];
  tis_make_unknown (buf, 100);
  size_t len = tis_interval (0, 100);
  char x;
  char * result = tis_nondet_ptr (NULL, &x);
  int r = compute (buf, len, result);
}
  • the builtin tis_init_type can be used to initialize a simple pointer, such as int * p, or a pointer to a recursive data structure, such as struct list * p. It takes five arguments:

    tis_init_type(str_type, ptr, depth, width, valid)
    
    • the first argument const char * str_type should be a string representing a valid type of the memory to initialize.
    • the second argument void * ptr should be a pointer to the memory area to initialize.
    • the third argument unsigned long depth should be an integer that exactly mirrors the behavior of the option -context-depth during the initialization.
    • the fourth argument unsigned long width should be an integer that exactly mirrors the behavior of the option -context-width during the initialization.
    • the last argument should be either 0 or 1 that exactly mirrors the behavior of the option -context-valid-pointers during the initialization.

Example:

#include<tis_builtin.h>
struct list {
    int data;
    struct list * next;
};
int main(){
    int *p0, *p1, *p2;
    struct list * p3;
    tis_init_type("int *", &p0, 1, 1, 1);
    tis_init_type("int *", &p1, 1, 10, 1);
    tis_init_type("int *", &p2, 1, 1, 0);
    tis_init_type("struct list *", &p3, 3, 1, 1);
    tis_dump_each();
}

The code above calls tis_init_type to initialize pointers p0, p1, p2 and p3. More specifically:

  • The call tis_init_type("int *", &p0, 1, 1, 1) allocates an array of size 1 given by the width argument, initialize the array element to any possible integer: S_p0[0] [--..--], and then assign the array address to pointer p0: p0 {{ &S_p0[0] }}.
  • The call tis_init_type("int *", &p1, 1, 10, 1) allocates an array of size 10: S_p1[0..9] [--..--] and assigns its address to pointer p1: p1 {{ &S_p1[0] }}.
  • The call tis_init_type("int *", &p2, 1, 1, 0) sets the last argument to 0, which allows p2 possibly be a NULL pointer: p2 {{ NULL ; &S_p2[0] }}.
  • The call tis_init_type("struct list *", &p3, 3, 1, 1) allocates a list of length 4 (the index of the last list element corresponds to the depth argument), and assign the list head address to pointer p3.
p0 ∈ {{ &S_p0[0] }}
S_p0[0] ∈ [--..--]

p1 ∈ {{ &S_p1[0] }}
S_p1[0..9] ∈ [--..--]

p2 ∈ {{ NULL ; &S_p2[0] }}
S_p2[0] ∈ [--..--]

p3 ∈ {{ &S_p3[0] }}
S_p3[0].data ∈ [--..--]
    [0].next ∈ {{ &S_next_0_S_p3[0] }}
S_next_0_S_p3[0].data ∈ [--..--]
             [0].next ∈ {{ &S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
                      [0].next ∈ {{ &S_next_0_S_next_0_S_next_0_S_p3[0] }}
S_next_0_S_next_0_S_next_0_S_p3[0].data ∈ [--..--]
                               [0].next ∈ {0}

In order to obtain more details about the available functions which allow building imprecise values, refer to the Abstract values section, or browse the file:

more $(tis-analyzer -print-share-path)/tis_builtin.h

Generate an Entry Point

Some tools are also available and may help to build the entry point for specific situations (see tis-mk-main Manual).

Check the External Functions

Now, when the main entry point is ready, it is time to run the value analysis for the first time using the -val option.

An important thing to check is the nature of external functions. More precisely, to look for this message in the log file:

$ grep Neither proj.log
[kernel] warning: Neither code nor specification for function ...

This message indicates that the given function is undefined. In order to progress with the value analysis, it MUST be defined by either:

  • adding missing source files,
  • or writing C stubs,
  • or specifying it with ACSL properties.

The libc library functions should not appear in these messages since most of them are already specified in provided library files (see About Libraries).

Writing C Stubs

Writing C stubs for functions for which no code is available is the recommended way to go. The standard functions and the builtins presented above (see Write an Entry Point) may be used to abstract the implementation details.

To illustrate how to write stubs using standard functions and analyzer builtins, say that the code we want to analyse to find errors in it is the function main below, and we do not have the code for the function mystrdup.

char *mystrdup(char *s);

int main(void) {
  char c, *p;
  int x;
  p = mystrdup("abc");
  if (p)
    c = p[0];
  x = c - '0';
}

There is currently no good way to write a specification that indicates that mystrdup allocates a new block and makes it contain a 0-terminated string, but instead, the recommended method is to abstract it with a stub that may look as follows:

#include <string.h>
#include <stdlib.h>
#include <tis_builtin.h>

char *mystrdup(char *s) {
  size_t l = strlen(s);
  char *p = malloc(l+1);
  if (p) {
    tis_make_unknown(p, l);
    p[l] = 0;
  }
  return p;
}

The files can be analyzed with:

$ tis-analyzer -val -slevel 10 main.c mystrdup.c

As shown in the trace, the analyzer correctly detects that the main function may use c uninitialized:

tests/val_examples/stub_main.c:13:[kernel] warning: accessing uninitialized left-value: assert \initialized(&c);
tests/val_examples/stub_main.c:13:[kernel] warning: completely indeterminate value in c.

Specifying External Functions

When specifying an external function with the ACSL properties, only the assigns properties are mandatory: they give to the tool an over-approximation of what can be modified. However, providing also the function’s post-conditions can help the analyzer and yield more precise results (see Write a Specification).

Tune the Precision

Performing value analysis with no additional options (like in all the cases above) makes it run with a rather low precision. It should not take too long to get the results that indicate where the alarms were found. When using the GUI, the list of alarms can be selected using the Kind filter of the Properties panel, and a summary of the number of alarms can be found in the Dashboard panel.

The global precision can be changed using the -slevel n option. The greater n is, the more precise the analysis is (see About the Value Analysis for more details). These alarms which could be formally verified by increasing the precision in this way will disappear. Those which remain are the difficult part: they require further attention.

Value analysis takes longer and longer when the precision increases. Thus it can be profitable to fine tune the precision locally on certain functions in order to benefit from the higher precision level where it is advantageous (so that more alarms are formally verified) while keeping it lower where it matters less (so that the analysis runs faster).

For the same reason (fast analysis to find bugs earlier) it can also be useful to reduce (temporarily) the size of the arrays (when the source code is structured to allow this easily).

Using the command line option -tis one can examine the evolution of the analysis. More exactly, this option provides an interactive view of what the analyzer is doing and, for readability reasons, it should be always used in a large terminal. The insight into the progress of the analysis gained in this way helps greatly in choosing appropriate global and local precision levels:

  • If it appears that huge amount of time is spent in a low level function f, its precision can be decreased using the -slevel-function f:n option with n smaller than the global precision.
  • In the opposite case, if the global precision has to be kept low for performance issues, the local precision can be increased in some functions only.

The same information is shown when starting the value analysis from the GUI and the final analysis information can be found in the Dashboard panel.

Note that at this point the goal is not to study the alarms precisely, but rather to get a rough idea of the amount of work needed in order to be able to decide which part to study.

To Split or Not to Split

The experience suggests that if the size of the analyzed source code is large and / or if there are many alarms, it is usually worthwhile to split the study into smaller, more manageable sub-components. The idea here is to write a precise specification for every sub-component and then analyze each of them independently toward its particular specification. Afterwards, the main component can be studied using those specifications instead of using directly the sub-components’ source code.

It is quite easy to decide which part should be split if some main features are identifiable and clearly match a given function. Otherwise, a first overview of the number of alarms may help to isolate a part that seems difficult for the analyzer. However, as the separated function must be specified, it is much easier if it has a small and clear interface (in order study a function, it must be called in the intended context, and this context might be difficult to build if it corresponds to a large and complex data structure).

To split the analysis one must write:

  • a main function for the main component,
  • a main function and an ACSL specification for each of the API functions which is supposed to be studied independently.

Then, when performing the analysis for the main component, the -val-use-spec option should be used in order to provide the list of the API specified functions. For each of the functions from this list the value analysis will use the function’s ACSL specifications instead of the function’s body.

For instance, the commands below can be used to split the study into the main analysis with two sub-components corresponding to the f1 and f2 functions:

$ tis-analyzer -val $SRC main.c -val-use-spec f1,f2 \
                                -acsl-import f1.acsl,f2.acsl \
                                -save project.state
$ tis-analyzer -val $SRC main_f1.c -acsl-import f1.acsl -save project_f1.state
$ tis-analyzer -val $SRC main_f2.c -acsl-import f2.acsl -save project_f2.state

In the commands above:

  • the files main.c, main_f1.c and main_f2.c should hold the entry points for the main component, and the f1 and f2 functions respectively;
  • the files f1.acsl and f2.acsl should hold the ACSL specifications of the, respectively, f1 and f2 functions (see Write a Specification to learn how to write a specification).

Multi-Analysis

There is another case where studying an entry point may require several separate analyses: when there is a parameter in the program that has to be attributed a value (e.g. using a macro) and when it is difficult to give it an arbitrary value beforehand (e.g. it is a parameter defining the size of an array). In such situations it is better to write a loop in an external script to attribute different values to that parameter and run as many analyses as necessary.

The following script runs an analysis for every N being a multiple of 8 from 16 to 128:

#/bin/bash

for N in $(seq 16 8 128) ; do
  tis-analyzer -D N=$N -val $SRC main.c
done

Of course, it supposes that N is used somewhere in main.c.

Write a Specification

Writing a specification for a function is useful in two cases:

  • When some splitting is done.

    A certain function can be studied independently, as a separate sub-component of the analysis. The function is verified toward a certain specification and then that specification can be used (instead of using directly the function’s body when analyzing function calls) in the main component’s verification (To Split or Not to Split).

  • When using some unspecified external functions.

    If an external function, that is not part of the subset of the libc library functions provided with the tool (which are already specified), is used in the program, then it needs an explicit specification. The provided specification has to indicate at least which of the concerned data may be possibly modified by the function. Pre and postconditions are not mandatory in that case.

The specification is written in ACSL and is mainly composed of:

  • preconditions (requires properties),
  • modified data descriptions (left part of assigns properties),
  • dependencies (\from part of assigns properties),
  • postconditions (ensures properties).

The ACSL properties can be either inlined directly in the source files or written in separate files and loaded (as explained in ACSL Properties). An analysis will use the specification instead of the function’s body to process a function call when either the body is not provided or an appropriate -val-use-spec option has been set in the command line.

When analyzing a function call using the specification, the tool:

  • first verifies that the pre-state satisfies the preconditions,
  • then assumes that:
    • the post-state only differs from the pre-state for the specified modified data,
    • and that the post-state satisfies the postconditions.

Pre-conditions

Pre-conditions of an External Function

In the specification of an external function the preconditions are not mandatory. If some are provided though, the analyzer checks whether they are satisfied at each call. Therefore adding preconditions in that case makes the verification stronger.

Pre-conditions of a Defined Function

When a defined function is analyzed separately from the main application, its preconditions define the context in which it is studied. This does not have to be the most general context imaginable, but it has to include at least all the possible usage contexts that can be found in the application.

For example, suppose that the function f has been selected to be studied independently, and that it takes a single parameter x. If f is always called from the application with positive values of x, it is possible to study it only in that context. This property must be then specified explicitly by the precondition:

requires r_x_pos: x > 0;

Also, the corresponding main_f function - i.e. the one that is written and used as an entry point in order to study f individually - must call f with all the positive values for x. If the preconditions specify a context smaller than the context defined implicitly by the main function, it will be detected by the analysis since some of the preconditions will be then invalid. But the opposite case (i.e. if the specified context is larger than the studied input context) would not be detected automatically.

In other words, in the example above, if main_f calls f with (x >= 0), it will be detected since (x == 0) does not satisfy the precondition. However, if it calls f only with (0 < x < 10), the precondition will be formally satisfied, but the function behavior for (x >= 10) will not be studied. If, for instance, f is then called in the application with (x == 20), the problem will not be detected since the precondition is valid for this value.

Warning

When verifying a function toward a specification that is then used to verify another component, it is very important to make sure that the context defined by the specified preconditions and the studied input context represent the same thing.

Note that:

  • in most cases each of the function’s parameters should be specified in at least one precondition;
  • if a parameter can hold any value, but has to be be initialized, the \initialized precondition should be included in the specification;
  • in case of pointers, it is important to specify whether they have to be \valid (meaning that they point to an allocated memory zone).

Modified Data and Data Dependencies

The assigns properties are composed of two parts, which specify the modified data and its dependencies:

//@ assigns <left part> \from <right part>;

Modified Data

Each assigns property specifies the modified data on the left side of the \from keyword.

The union of the left parts of all the assigns properties in a given function’s specification is an over-approximation of the data modified by this function. Hence the data that is not in this set (i.e. the set defined by the union of their left parts) is expected to have the same value in the pre-state and the post-state.

The information about the modified data is used:

  • by the WP plug-in, for any function call;
  • by the Value plug-in, when the specification of a called function is used instead of its body.

Data Dependencies

Each assigns property specifies the data dependencies on the right side of the \from keyword.

The output value of the modified data is expected to depend only on the value of its data dependencies. In other words, if the value of the dependencies is equal in two input states, then the value of the modified data should be equal in the two output states.

There are two kinds of dependencies:

  • the direct dependencies are those which are used to compute the value;
  • the indirect dependencies are those which are used to choose between the executed branches or the address of the modified data (through pointers, indexes, etc.).

The indirect dependencies have to be explicitly marked with an indirect: label. All the other dependencies are considered as direct.

Here are some examples of correctly defined dependencies:

//@ assigns \result \from a, b, indirect:c;
int f (int a, int b, int c) { return c ? a : b; }

int t[10];
//@ requires 0 <= i < 10; assigns t[..] \from t[..], a, indirect:i;
void set_array_element (int i, int a) { t[i] = a; }

The dependency information is:

  • Not used by the WP plug-in.

  • Very important for many analysis techniques that require knowledge about the data dependencies (such as Show Defs feature in GUI, slicing, etc.), but only when the function body is not used, since if the body is available the dependencies can be computed by the From plug-in.

  • Employed in the value analysis of the pointers: the output value of the modified pointers can only be among the specified direct dependencies.

    Note that an intermediate pointer is needed when a pointer is assigned to the address of a variable. This property is not valid:

    assigns p \from &x; // NOT valid.
    

    One must rather declare T * const px = &x; in the code and then write the correct property:

    assigns p \from px;
    

    It means exactly that the output value of p may be based on &x and on no other existing variables.

Remember that the assigns properties specify an over-approximation of the modified data. For instance, the following properties only say that nothing except B is modified by the function:

assigns B \from \nothing;
assigns \result \from B;

In order to specify that B is surely initialized after the function call, one has to add a post-condition:

ensures \initialized(&B);

When the function result is used to return an error status, it is often the case that the post-condition rather looks like:

ensures (\result == 0 && \initialized(&B)) || (\result < 0);

Post-conditions

It is not mandatory to specify ensures properties, neither in the case of splitting a defined function nor for specifying an external function. However, some information about the values in the returned state might be needed in the analysis of the caller function.

Post-conditions of an External Function

In the specification of an external function, the provided post-conditions cannot be checked since the source code is not available. Hence they are used as hypotheses by the analysis and cannot be formally verified themselves.

Warning

As the post-conditions of external functions cannot be verified by the tool, they must be checked with extra care!

Post-conditions of a Defined Function

If ensures properties are specified, it is usually good to keep them as simple as possible. They have to be verified during the function’s body analysis and over-specification only increases the amount of work necessary to achieve that.

Check the Coverage

Before going any further, it is often advantageous to check the code coverage in order to verify if all the dead code which exists in the application (i.e. the parts of code composed of the functions and branches that are not reachable by the analysis) is indeed intended. To learn how to do that, see Information about the Coverage.

Dead code can be spotted in the GUI by looking for statements with the red background. If some dead code seems strange, it can be explored and investigated using the value analysis results. Clicking on variables and expressions allows to inspect their computed values.

As long as the analysis did not degenerate, the code classified as dead by the tool is a conservative approximation of the actual dead code. It is guaranteed that, in the context defined by the input values, the concerned statements cannot be reached whatever happens. When relying on this guarantee, one should however keep in mind these two important assumptions it depends on: that it applies only if the analysis did not degenerate and that dead code is always considered in the context defined by the input values. This is because most of the time if some code has been marked as dead when it should not have been, the reason is actually that the context of analysis was defined too restrictively (i.e. it does not include all the input values that can happen in the real execution). Another common reason is that the analysis has simply stopped computing a given branch:

  • either because the information was too imprecise to go further (in that case a degeneration warning message has been emitted),
  • or because a fatal alarm has led to an invalid state (e.g. accessing the content of a pointer which is NULL for sure renders the state invalid).

Summary

At this point, one should have some analyses (one or several) which cover the intended parts of the code and end without any degeneration. The results most likely include alarms. The chapter Study the Alarms explains how to deal with the alarms and, before that, the chapter Get the Information explains how to extract more information from the analysis results.

In case if, due to applying some splitting, there are several analyses, there is no preferred order of the verifications. In any case however, modifying the existing specifications leads to invalidating the results obtained so far.