Analyzing C++ Programs

TrustInSoft Analyzer++ lets you analyze C++ programs. This document describes:

In addition, there is also a separate getting started tutorial on analyzing C++ code in the Analyzing C++ code section of the manual.

TrustInSoft Analyzer++ Specificities

Identifiers

Mangling

The identifiers in a C++ program are mangled to match C identifiers. The mangling scheme used in TrustInSoft Analyzer is a variation of Itanium mangling. The differences are:

  • Class, union and enum names are also mangled, even if this is not required by Itanium. The grammar entry used for these types is _Z<name>. As such, the class:

    struct Foo {
      int x;
    }
    

    is translated as:

    struct _Z3Foo { int x; }
    
  • Local variables and formal parameter names are also mangled, to avoid shadowing extern "C" declarations. The grammar entry used for a local variable is _ZL<unqualified-name>. As such, the local variable bar in:

    int main() {
      int bar x = 2;
    }
    

    is mangled as _ZL3bar. The keyword this is not mangled.

  • The virtual method table and the typeinfo structure for a class Foo are mangled as extra static fields named __tis_class_vmt and __tis_typeinfo in this class. As such, the class:

    struct Foo {
      virtual void f() {}
    };
    

    leads to the generation of two variables with mangled names _ZN3Foo15__tis_class_vmtE and _ZN3Foo14__tis_typeinfoE.

Demangling

To make reading the identifiers easier, TrustInSoft Analyzer displays by default a demangled version of the identifier. In the GUI, the mangled name can be obtained by right-clicking on an identifier and select Copy mangled name.

Signatures are ignored when demangling function names. As such, the assignment in:

void func(int) {}

void
test()
{
    void (*ptr)(int) = &func;
}

is displayed as:

void (*ptr)(int);
ptr = & func;

even if the mangled name of func is _Z4funci. This can lead to ambiguity when there are multiple overloads for the named function. A solution to solve it is to look at its mangled name.

Constructors and destructors are demangled as Ctor and Dtor. If the constructor or destructor is a constructor for a base class and is different from the constructor for the most derived object, the suffix Base is added. If the constructor is a copy constructor, the suffix C is added. If the constructor is a move constructor, the suffix M is added. Therefore, the demangled name Foo::CtorC stands for the copy constructor of the class Foo. If the destructor is virtual, it will be demangled as DeletingDtor.

The option -cxx-filt can be used to print the demangled version of an identifier, as demangled by the analyzer. If the identifier is a function name its signature will also be printed. For example, the command tis-analyzer++ -cxx-filt _Z3fooii displays {foo(int, int)}.

When displayed, function return types are preceded by a -> symbol and are displayed after the formal parameter types. For example, the instance of the function show in the following code:

struct Foo {
    void f(int) {}
};

template <typename T>
void show(const T&) {}

int
main()
{
    show(&Foo::f);
}

is printed as show<{(int) -> void} Foo::*>.

Template parameter packs are printed enclosed by [ and ]. As such, the command tis-analyzer++ -cxx-filt _Z1fIJ3Foo3FooEEvDpRKT_ displays {f<[Foo, Foo]>(const [Foo, Foo]&) -> void}: f is a function templated by a parameter pack, which is instantiated with Foo, Foo. Note also that in this case the const and & are applied to the whole pack.

Names displayed in the GUI can be prefixed by .... These names are shortened versions of qualified names. Clicking on this prefix will display the full mangled or demangled name, depending on the command line options.

Functions and methods

Argument passing

When calling a function, TrustInSoft Analyzer uses different transformations to initialize the function’s arguments depending on the type of the argument. These transformations match Itanium calling convention.

Scalar types

Scalar types are kept as is.

Reference types

Reference types are translated as pointers to the referenced types. The initialization of an argument of reference type is translated as taking the address of the initializer. If this initialization requires the materialization of a temporary object, this step is done by the caller. For example, with the following original source code:

void f(int &&, int &);

void g() {
    int x;
    f(2, x);
}

the translated declaration for the function f is void f(int *a, int *b) and the call to f is translated as:

int x;
int __tis_temporary_0;
__tis_temporary_0 = 2;
f(& __tis_temporary_0,& x);
Class types

The passing of a class type depends on whether the class is non-trivial for the purposes of calls. A class type is non-trivial for the purpose of call if:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

If the type is non-trivial for the purposes of calls, a variable of the class type is defined in the caller and the function receives a pointer to this variable. Such variables are named __tis_arg_##. For example, in the following code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

void f(Obj x, Obj y);

void g() {
    f( {}, {} );
}

the translated function f has the signature:

void f(struct Obj *x, struct Obj *y);

and its call is translated as:

struct Obj __tis_arg;
struct Obj __tis_arg_0;
{
  Obj::Ctor(& __tis_arg_0);
  Obj::Ctor(& __tis_arg);
}
f(& __tis_arg,& __tis_arg_0);

If the function returns a class that is non-trivial for the purposes of calls, then it is translated as a function returning void but with an additional argument. This argument is a pointer to a variable in the caller that will receive the function return. If the caller does not use the function return to initialize a variable, a variable named __tis_cxx_returnarg_## is created for this purpose.

For example, with the following original source code:

struct Obj {
    Obj();
    Obj(const Obj &);
};

Obj f();

void g() {
    Obj o = f();
    f();
}

the translated function f has the signature:

void f(struct Obj *__tis_cxx_return)

and the body of the function g is translated as:

struct Obj o;
f(& o);
{
  struct Obj __tis_cxx_returnarg;
  f(& __tis_cxx_returnarg);
}
return;

If the type is trivial for the purposes of calls, no transformation is applied and the object is passed by copying its value. For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f(Obj o);

void g() {
    f( {} );
}

the signature of the translated function f is

struct Obj f(struct Obj o)
Unknown passing style

Sometimes, TrustInSoft Analyzer cannot decide if a class is trivial for the purposes of calls in a translation unit. In such cases, it will assume that the type is non-trivial for the purposes of calls and emit a warning like:

[cxx] warning: Unknown passing style for type 'Foo'; assuming
non-trivial for the purpose of calls. Use the option
'-cxx-pass-by-value _Z3Foo' to force the opposite.

If the user knows that the type is trivial for the purpose of calls, he can use the option -cxx-pass-by-value to force this.

For example, with the following original source code:

struct Foo;

void f(Foo x);
  • with no particular option set, TrustInSoft Analyzer will produce the following warning and declaration for f:

    [cxx] warning: Unknown passing style for type 'Foo'; assuming
    non-trivial for the purpose of calls. Use the option
    '-cxx-pass-by-value _Z3Foo' to force the opposite.
    
    void f(struct Foo *x);
    
  • with the option -cxx-pass-by-value _Z3Foo, TrustInSoft Analyzer will produce the following declaration for f without warning:

    void f(struct Foo x);
    

Using an incorrect passing style can lead to errors like:

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct _Z3Foo * vs. struct Foo
                   First declaration was at file1.cpp:7
                   Current declaration is at file2.c:7

or

[kernel] user error: Incompatible declaration for f:
                   different type constructors: struct Foo vs. void
                   First declaration was at file.c:7
                   Current declaration is at file.cpp:7

Method transformations

Methods do not exist in C, and are translated as functions by TrustInSoft Analyzer++. The following additional transformations are applied to non-static methods:

  • the name of the function is the qualified name of the method.
  • if the method is a non-static method, the function gets an additional this argument. Its type is a pointer to the class enclosing the method. There is an additional const qualifier if the method is const-qualified.
  • if the method is a non-static method, the this argument is initialized with the address of the calling object.

For example, with the following original source code:

struct Obj {
    Obj();
    static void bar(int x);
    void foo(int x) const;
};

void
f(void)
{
    Obj o;
    o.foo(1);
    Obj::bar(0);
}

two function declarations are produced:

void Obj::bar(int x);
void Obj::foo(const struct Obj *this, int x);

and the calls to foo and bar are translated as:

Obj::foo(& o,1);
Obj::bar(0);

Constructor elision

By default, constructor elision is enabled and TrustInSoft Analyzer++ will omit some calls to copy or move constructors to temporary objects, as allowed by C++ standards from C++11 onwards.

Constructor elision can be disabled with the -no-cxx-elide-constructors option.

For example, with the following original source code:

struct Obj {
    Obj();
};

Obj f();

void g() {
    Obj y = f();
}

when constructor elision is enabled, the call to f is translated as:

f(& y);

However, when constructor elision is disabled with the option -no-cxx-elide-constructors, it is translated as:

struct Obj __tis_temporary_0;
f(& __tis_temporary_0);
Obj::CtorM(& y,& __tis_temporary_0);

In this case, the result of the call to f is written to the temporary object __tis_temporary_0 and this temporary object is then moved to the initialized variable y.

Virtual method calls

Virtual method calls translation is separated in three steps:

  • get the information required to call the method from the virtual method table of the object. The information is put in a variable named __virtual_tmp_XXX, where XXX is the unqualified name of the method.
  • adjust the value of the this pointer calling the method, and call the resolved function pointer using the previous information.
  • adjust the value of the this to fetch the eventual virtual base (see the paragraph at the end of this section)
  • adjust the value returned by the call if the function might be a covariant override and the returned value is not nullptr.

As such, the function call_get in the following code:

struct Foo {
    virtual Bar *get() { return nullptr; }
};

Bar *call_get(Foo *f) {
    return f->get();
}

is translated as:

struct Bar *call_get(struct Foo *f)
{
  struct Bar *__retres;
  char *__virtual_return_get;
  struct __tis_vmt_entry const *__virtual_tmp_get;
  char *tmp_0;
  __virtual_tmp_get = f->__tis_pvmt + 1U;
  __virtual_return_get = (char *)(*((struct Bar *(*)(struct Foo *))__virtual_tmp_get->method_ptr))
  ((struct Foo *)((char *)f + __virtual_tmp_get->shift_this));
  if (__virtual_return_get) tmp_0 = __virtual_return_get + __virtual_tmp_get->shift_return;
  else tmp_0 = __virtual_return_get;
  __retres = (struct Bar *)tmp_0;
  return __retres;
}

The special case of covariance on virtual bases: if the called virtual function is covariant and if its return type has a virtual base of the return type of the overridden function, we need to fetch this virtual base at the call site.

To do so we need to get the offset to apply to the returned object pointer. This offset is in an array, and there is a pointer to this array at the offset 0 of the returned object. So we cast the returned object as pointer to an array of offsets, and access this array at the vbase_index to get the offset.

In this case the code is translated as such:

if (__virtual_tmp_f->vbase_index != (long)(-1)) // do we have a virtual base?
  __virtual_return_f += *(
    *((long **)__virtual_return_f) // get the array of offsets
    + __virtual_tmp_f->vbase_index); // get the appropriate offset
Controlling the virtual method calls translation

The option -no-cxx-inline-virtual-calls can be used to replace this transformation by a call to a generated function named XXX::__tis_virtual_YYY, where:

  • XXX is the static type of the class containing the method that was called.
  • YYY is the unqualified name of the method.

With this option, the function call_get of the example above is translated as:

struct Bar *call_get(struct Foo *f)
{
  struct Bar *tmp;
  tmp = Foo::__tis_virtual_get(f);
  return tmp;
}

The generated __tis_virtual_ functions keep the states obtained by the virtual call separated.

Objects memory layout

TrustInSoft Analyzer uses its own memory layout to represent C++ objects. In order to preserve as much useful information as possible, the analyzer defines multiple well-typed data structures, and uses more than one extra pointer field in polymorphic classes. As a result of this choice, the numeric value of sizeof(Class) will differ between the compiled code and the analyzed code.

Objects, being class or struct, are translated as C structures. union are translated as C unions.

The inline declaration of a static field is translated as a declaration of a global variable with the same qualified name. The out-of-line definition of a static field is translated as a definition of a global variable with the same qualified name.

Non-static fields are translated as fields in the translated structure. The fields are emitted in the source code order.

Empty classes

Empty classes are translated as a structure with one field char __tis_empty;. This enforces that the size of an empty class is not zero.

Non-virtual inheritance

Non-virtual non-empty base classes are translated as fields in the derived class. Such fields are named __parent__ followed by the name of the base class.

For example, with the following original source code:

class Foo {
    int x;
};

struct Bar: Foo {
    int y;
    int z;
};

the structures produced for the class Foo and Bar will be:

struct Foo {
   int x ;
};

struct Bar {
   struct Foo __parent__Foo ;
   int y ;
   int z ;
};

Non-virtual empty base classes do not appear in the translated C structure. For example, with the following original source code:

class Foo { };

struct Bar: Foo {
    int y;
    int z;
};

the structure produced for the class Bar is:

struct Bar {
   int y ;
   int z ;
};

In this case, a reference to the base Foo of an object of type Bar binds to the original object. In other words, the assertion in the following program is valid in the model used by TrustInSoft Analyzer:

class Foo {};

struct Bar: Foo {
    int y;
    int z;
};

int
main()
{
    Bar b;
    Foo &f = b;
    void *addr_b = static_cast<void *>(&b);
    void *addr_f = static_cast<void *>(&f);
    //@ assert addr_b == addr_f;
}

Polymorphic classes

If a C++ class is polymorphic, its corresponding C structure contains two additional fields:

  • struct __tis_typeinfo const *__tis_typeinfo; holding a pointer to the type_info of the most derived object of the current object.
  • struct __tis_vmt_entry const *__tis_pvmt; holding a pointer to the virtual method table of the current object.

As an example, the class:

struct Foo {
    int x;
    virtual void f() {}
};

is translated as:

struct Foo {
   struct __tis_typeinfo const *__tis_typeinfo ;
   struct __tis_vmt_entry const *__tis_pvmt ;
   int x ;
};

These additional fields are set by the constructors of the polymorphic class.

Virtual inheritance

If a class has a virtual base, its translation produces two different C structures: the regular C structure as well as a base version of the class.

The regular structure is used when the object is the most derived object. In this case:

  • the structure gets an additional field long const * __tis_vbases_ptr;. This is an array holding the offset of each virtual base of the object.
  • all virtual bases of the class are translated as fields in the C structures but their name is prefixed by __tis_vbases_ to distinguish them from non-virtual bases.

The base version of the object has its name prefixed by __vparent__ and is used when the object is used as a base for another object. In this case:

  • the structure gets an additional pointer __tis_vbases_ptr of type long const *. This is an array to the offset of each virtual base of the object.
  • the structure does not contain fields related to the virtual base classes.

As an example the following class:

struct Baz: Bar, virtual Foo {
    int z;
};

produces the two classes:

struct Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
   struct Foo __vparent__Foo ;
};

struct __tis_base_Baz {
   long const *__tis_vbases_ptr ;
   struct __tis_base_Bar __parent__Bar ;
   int z ;
};

Accessing a virtual base is always done by shifting the address of the current object with the offset of the virtual base in the __tis_vbases_ptr array.

As an example, with the following code:

struct Foo {
    int x;
};

struct Bar: virtual Foo {
    int y;
};

int
main()
{
    Bar bar;
    Foo &foo = bar;
}

the body of the main function is translated as:

int __retres;
struct Baz baz;
struct Foo *foo;
Baz::Ctor(& baz);
foo = (struct Foo *)((char *)(& baz) + *(baz.__tis_vbases_ptr + 0));
__retres = 0;
return __retres;

The virtual base Foo of a class Baz has index 0, so the offset to use to go from Baz to Foo is *(baz.__tis_vbases_ptr + 0)

Layout summary

The full layout for objects is the following, in increasing address order:

  • For most-derived classes and classes with no virtual base:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields
    • Virtual bases
  • For classes with virtual bases used as base classes:
    • Offsets of virtual bases
    • Non-virtual non-empty bases
    • Type information of the most derived object
    • Virtual methods table
    • Non-static fields

Member pointers

Pointers to member functions declaration

Pointers to a method X Foo::f(A1, A2, ..., An) are translated as a C structure with the following fields:

unsigned long vmt_index ;
X (* __attribute__((__tis_sound_cast__)) ptr)(struct Foo *, A1, A2, ..., An) ;
long shift ;
size_t vmt_shift ;

If Foo::f is a non-virtual method, then:

  • the field ptr is a pointer to the method called when resolving the symbol f in the scope of Foo. This can be the method Foo::f if f is declared in Foo or a method of one of the parent classes of Foo.
  • the field shift is the offset of the base containing the method f. If f is in Foo, then this is 0, otherwise it is the offset of the parent class declaring f.
  • the field vmt_index is 0.

If Foo::f is a virtual method, then:

  • the field ptr is the same as if Foo::f was a non-virtual method.
  • the field shift is the same as if Foo::f was a non-virtual method.
  • the field vmt_index is 1 + the index of the method in the virtual method table of the class containing the final override of f in Foo. This can be different from the index of f in the virtual method table of Foo if the final override of f is declared in a parent class of Foo.

Each pointer to member function type produce a different structure type. The structure type is named __tis_XXXX, where XXXX is the mangled name of the method pointer type.

For example, with the classes:

struct Pack {
    char c[1000];
};

struct Bar {
    int y;
    int f() { return 2; }
};

struct Foo: Pack, Bar {
    virtual void g() {}
};

the following statements:

int (Foo::*x)(void) = &Foo::f;
void (Foo::*y)(void) = &Foo::g;

are translated as:

struct __tis_M3FooFivE x;
struct __tis_M3FooFvvE y;
x.vmt_index = 0UL;
x.ptr = (int (*)(struct Foo *))(& Bar::f);
x.shift = 0L - (long)((struct Foo *)((unsigned long)0 - (unsigned long)(& ((struct Foo *)0)->__parent__Bar)));
x.vmt_shift = 0UL;
y.vmt_index = 2UL;
y.ptr = (void (*)(struct Foo *))(& Foo::g);
y.shift = 0L;
y.vmt_shift = ((unsigned long)(& ((struct Foo *)0)->__tis_pvmt);

Program initialization

Dynamic initialization

If a variable v is initialized at dynamic initialization time, it is translated as:

  • a declaration for the variable v.
  • a function void __tis_init_v(). The content of this function is the translation of the initializer of v.

All __tis_init_XXX functions are called by a special function __tis_globinit. The __tis_globinit function is in turn called at the beginning of the main function.

As an example, the program:

int id(int x) { return x; }

int x = id(12);

int
main()
{
    return x;
}

is translated as:

int x;
void __tis_init_x(void)
{
  x = id(12);
  return;
}

 __attribute__((__tis_throw__)) int id(int x);
int id(int x)
{
  return x;
}

 __attribute__((__tis_throw__)) int main(void);
int main(void)
{
  __tis_globinit();
  return x;
}

void __tis_globinit(void)
{
  __tis_init_x();
  return;
}

Static initialization

A variable with a constant initializer is translated as a C variable with an initializer. The initializer is the value of the C++ constant initializer. As an example, with the following code:

constexpr
int
add_one(int x)
{
    return x + 1;
}

const int x = add_one(2);

the definition of the variable x is translated as:

static int const x = 3;

In some special circumstances, one may need to disable static initialization semantics described by the C++ standard. It can be done using the option -no-cxx-evaluate-constexpr. In this case, whenever a variable is initialized with a constant initializer that is not a constant initializer according to the C rules, the initialization of this variable is done at dynamic initialization time and uses the initializer as it was written by the user.

Using this option can lead to unsound results. As an example, with the following program:

constexpr int id(int x) { return x; }

extern const int x;

const int y = x;

const int x = id(1);

int
main()
{
    int a = y;
    int b = x;
    //@ assert a == b;
    return a == b;
}
  • the assertion is valid from the C++ standards point of view.
  • the assertion is valid using the command-line tis-analyzer++ --interpreter test.cpp.
  • the assertion is invalid using the command-line tis-analyzer++ --interpreter test.cpp -no-cxx-evaluate-constexpr.

Static local variables

A static local variable x is translated as a triple of:

  • a global variable, corresponding to the translated static local variable. The type of the variable is the translated type of the static variable, and the name of the variable its prefixed by its enclosing function name. This variable is 0-initialized.
  • a global variable __tis_guard_x.
  • a check on __tis_guard_x ensuring the initialization of the static variable is not recursive, followed by a conditional block doing the initialization of the variable once.

The variable __tis_guard_x can have the following values:

  • -1: the variable x has not been initialized yet.
  • 0: the variable x is being initialized.
  • 1: the variable x has been initialized.

As an example, the following function:

int
main()
{
    static Foo f;
    return 0;
}

is translated as:

int main::__tis_guard_f;
struct Foo main::f = {.x = 0};

int main(void)
{
  int __retres;
  tis_ub("Recursive initialization of the static local variable f.",
       main::__tis_guard_f != 1);
  if (! main::__tis_guard_f) {
    main::__tis_guard_f ++;
    Foo::Ctor(& main::f);
    main::__tis_guard_f ++;
  }
  __retres = 0;
  return __retres;
}

Special variable names

TrustInSoft Analyzer++ introduces several special variables while translating code. This section sumarizes the different name families used for these variables.

Global variables

  • __tis_ABI::exc_stack_depth: how many exceptions are currently raised.
  • __tis_ABI::exc_stack: all exceptions currently raised.
  • __tis_ABI::caught_stack_depth: how many exceptions are currently caught.
  • __tis_ABI::caught_stack: all exceptions currently caught.
  • __tis_unwinding: whether the program is currently unwinding its stack
  • XXX::__tis_class_vmt: virtual method table for an object of type XXX used as most derived object.
  • XXX::__tis_class_typeinfo: typeinfo for an object with a most derived object being of type XXX
  • XXX::__tis_class_inheritance: inheritance information for an object with most derived object being of type XXX

Intermediate variables

  • __Ctor_guard: guard used to check if the lifetime of an object has started.
  • __tis_alloc: materialization of a space reserved by an allocation function.
  • __tis_arg: materialization of a function argument.
  • __tis_assign: temporary variable used to hold the right hand side of an assignment if it has potential side effects.
  • __tis_bind: temporary variable used to initialize non-reference structured bindings of arrays.
  • __tis_cast: result of a dynamic_cast.
  • __tis_const: materialization of an function argument that is non-trivial for the purpose of calls.
  • __tis_compound_literal: materialization of a C++ temporary used to initialize a compound literal.
  • __tis_constant_expression: materialization of a C++ constant expression.
  • __tis_cxx_return_arg: materialization of the discarded result of a call that is non-trivial for the purpose of calls.
  • virtual_dtor_tmp: virtual method table cell used when calling a virtual destructor.
  • __tis_deleted_value: address of a deleted value.
  • __tis_dereference: temporary variable used to dereference a member pointer.
  • __tis_dyncast: operand of a dynamic_cast.
  • __tis_exn: materialization of an object of type std::bad_XXX being thrown.
  • __tis_gnu_ternary: shared computation in a GNU ternary expression.
  • __tis_guard: guard controlling the initialization of a static local variable.
  • __tis_implicit_value: materialization of a C++ temporary used to perform an implicit value initialization.
  • __tis_index: index used when destroying an array when its lifetime finishes.
  • __tis_initializer_list: materialization of a C++ temporary used to build an initializer list.
  • __tis_init_size: loop variable used to initialize arrays.
  • __tis_lambda_temp: materialization of a C++ temporary variable in a lambda.
  • __tis_lvalue: materialization of a C++ lvalue that is not an lvalue in C.
  • __tis_mmp: method pointer being called.
  • __tis_mmp_init: temporary variable used to initialize a method pointer.
  • __tis_object_cast: temporary variable used to compute object inheritance casts.
  • __tis_offset: index used to destroy array elements as a consequence of calling delete[].
  • __tis_placement: address of an object initialized by a placement new.
  • __tis_relop: intermediate result when translation relational operators.
  • __tis_temp: materialization of a C++ temporary variable.
  • __tis_thrown_tmp: materialization of a C++ temporary variable in a thrown statement.
  • __tis_typeid: address of an object with a polymorphic typeid being computed.
  • __virtual_return: the result of the call to a virtual method returning a pointer.
  • __virtual_this: this pointer computed when calling a virtual method.
  • __virtual_tmp: virtual method table cell used when calling a virtual method. The name of the called function is added to the end of the name of the temporary

Generated contracts

Function contracts will be automatically generated for non-static class methods, in order to require validity of the this pointer as a precondition. Copy and move constructors will also grow separated annotations since they are expected to operate on separate objects.

These contracts will be added to user-provided contracts, if any.

Computation of these annotations can be disabled with the -no-cxx-generate-contracts option.

Builtins for C++ analysis

TrustInSoft Analyzer C++ introduces additional builtins to support the analysis of C++ code bases. These are listed in the relevant section of the builtin reference and explained below.

C++-friendly tis_make_unknown builtin

The analyzer provides the function tis_make_unknown for use in C code. The function takes a pointer and size, and sets the contents of the so-described area of memory to be unknown. This can be used to abstract over the contents of variables and objects (see e.g., Prepare the Analysis).

TrustInSoft Analyzer++ overloads the function with its C++-friendly variant. This variant has the same semantics, but a different signature:

void tis_make_unknown(void *, unsigned long);

Here, the overloaded tis_make_unknown function takes void * as its first argument, in contrast to the C-variant which takes char *. Passing in void pointers is more convenient in C++, because C++ can implicitly cast any other pointer type to void * (whereas char * would require the cast to be explicit).

Metadata-preserving tis_make_unknown builtin

When using tis_make_unknown on an object, the analyzer treats the entire indicated memory area as having unknown contents. This includes both the user-defined data, as well as metadata, such as a virtual table pointers, which should typically be preserved. If this information is not preserved, the object’s virtual methods and base classes become imprecise.

For this reason, TrustInSoft Analyzer++ provides another C++-specific variant of the tis_make_unknown builtin:

template <typename T> tis_make_unknown(T *);

The builtin is defined as a function template that takes a pointer to an object of any type T. The builtin abstracts all the user-defined contents of the provided object, but does not interfere with metadata added required by the analyzer to model polymorphism and inheritance.

Example. Consider the following program. Here, we define a class named Obj whose two members are a field x and a virtual method f which also returns the value of x. Then, in main, we instantiate an object of this class and use tis_make_unknown to set the value of the object’s field to be unknown. We then use tis_show_each to print out the values of x and the result of the call to method f:

#include <tis_builtin.h>

struct Obj {
  int x;
  virtual int f() { 
    return x; 
  }
};

int main() {
  Obj obj = {};
  tis_make_unknown(&obj, sizeof obj);
  
  tis_show_each("obj.x", obj.x);
  tis_show_each("obj.f()", obj.f());
  
  return 0;
}

When we analyze this program, the analyzer shows the value of x, as expected, but subsequently it also raises an alarm indicating that the program tried to dereference an invalid pointer. The analyzer emits the alert because we used the variant of tis_make_unknown that also sets virtual method table pointers to unknown.

$ tis-analyzer++ -val poly.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
tests/tis-user-guide/man/tis-analyzer-plusplus/poly.cpp:15:[kernel] warning: pointer arithmetic: assert \inside_object_or_null((void *)obj.__tis_pvmt);
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])

Instead of using the variant of tis_make_unknown (with two arguments) that overwrites the object’s metadata, we should modify the program to use the template function variant of tis_make_unknown (with just one argument) to preserve object metadata while setting the object’s field to unknown:

  tis_make_unknown(&obj);

When we analyze this program now, it shows the expected (unknown) values of x and result of calling f, but does not emit an alert, meaning tis_make_unknown did not clear the object’s virtual method table pointer.

$ tis-analyzer++ -val poly2.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])
[value] Called tis_show_each({{ "obj.f()" }}, [-2147483648..2147483647])

Example. The following program presents a situation, where class Obj does not have members of its own, but it inherits field x from class Base via virtual inheritance. Then, the object obj of class Obj is instantiated in main and its members’ values are set to be unknown via tis_make_unknown. Then, we show the value assigned to field x inherited by object obj from class Base:

#include <tis_builtin.h>

struct Base {
  int x;
};

struct Obj: virtual Base {};

int main() {
  Obj obj = {};
  tis_make_unknown(&obj, sizeof obj);
  
  tis_show_each("obj.x", obj.x);
  
  return 0;
}

As in the example above, when running the analyzer on the program, we find that an alarm was raised, and again because this variant of tis_make_unknown sets the virtual method table pointers to unknown, but the virtual method table is also used in virtual inheritance.

$ tis-analyzer++ -val virt.cpp
tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: pointer arithmetic:
                  assert \inside_object_or_null((void *)obj.__tis_vbases_ptr);
tests/tis-user-guide/man/tis-analyzer-plusplus/virt.cpp:13:[kernel] warning: out of bounds read. assert \valid_read(obj.__tis_vbases_ptr+0);
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])

Hence, we should modify this program to the variant of tis_make_unknown that will preserve the pointer within obj to the virtual methods table with which it is associated:

  tis_make_unknown(&obj);

This removes the alert and produces the expected (unknown) value fo obj.x:

$ tis-analyzer++ -val virt2.cpp
[value] Called tis_show_each({{ "obj.x" }}, [-2147483648..2147483647])

Tip

In cases where a class does not depend on a virtual methods table (i.e., it has no virtual inheritance and not virtual methods), both variants of tis_make_unknown are equivalent and can be used interchangeably.

GoogleTest support

You can run your tests that use GoogleTest framework with TrustInSoft Analyzer++ in a few steps.

GoogleTest User’s Guide : https://google.github.io/googletest/

Options

TrustInSoft Analyzer++ provides 2 options in order to activate GoogleTest support:

  • Choose the -gtest option if you wrote the entry point of your tests yourself.
  • Choose the -gtest-main option (as opposed to the -gtest option) if your tests use the default GoogleTest entry point (from gtest_main library).

More details about gtest_main can be found at https://google.github.io/googletest/primer.html#writing-the-main-function

Configuration

Specify your own sources, headers and preprocessing options as you would do for any other analysis (see Prepare the Sources).

By providing the -gtest option (or the -gtest-main option) to TrustInSoft Analyzer++, the analyzer will pull in all GoogleTest source files and headers for you. Thus you do not have to list them in your analysis configuration files.

As an example, let us assume that you are testing a software module called module1, and that you have gathered your tests that use GoogleTest framework in a tests subdirectory.

|-- module1
|   |-- include
|   |   ...
|   |-- src
|   |   |-- component1.cc
|   |   ...
|   |-- tests
|   |   |-- component1_unittest.cc
|   |   ...
|-- my_analysis
|   |-- mod1_comp1_unittest.json
|   ...

For instance mod1_comp1_unittest.json would look like

{
  "name": "mod1_comp1_unittest",
  "prefix_path":"../module1/",
  "files": [
    "tests/component1_unittest.cc",
    "src/component1.cc"
  ]
}

Note that you do not need to add the gtest \*.cc files to the "files" list.

Running the analysis

Next run the analysis with both the --interpreter option (see Getting Started) and the -gtest option (or the -gtest-main option).

tis-analyzer++ --interpreter -gtest -tis-config-load path/to/<my_unit_test>.json

Note: We provided the option -gtest directly on the command line in order to highlight it, but you can also move it to the configuration file

"gtest": true

Limitations

  • Catching GoogleTest assertions is not supported (macro EXPECT_FATAL_FAILURE and EXPECT_NONFATAL_FAILURE)
  • Death tests are not supported (e.g. macro EXPECT_EXIT)

You should make sure by yourself that your tests do not use these features, as they are untested at the moment. Therefore, the analyzer will probably not do what you expect if you use these features, and neither will it specifically warn you that it does not support them.

For more details about the assertion macros provided by GoogleTest visit https://google.github.io/googletest/reference/assertions.html