Session 3 Summary

C++ Programming Course, Summer Term 2018, 27. April 2018

Code discussed in this session

Using Valid Expressions to Define Semantics

As already discussed several times in the previous sessions, a system of classes and concrete types is not an ideal mental model for the definition of robust and elegant programming abstractions.
Of course, we need classes and concrete types to eventually implement an abstraction. But for modeling, a class-based approach shifts your mind set to the view point of the implementer, not the user of the abstraction.

Instead, we think in terms of concepts:

  • A concept specifies valid expressions and operation semantics
  • Any type that satisfies the definition of a concept, disregarding how this compliance is achieved, is a model of this concept
  • Concept semantics must be well-defined

In less civilized (Java) terms, a concept relates to a concrete type (model of the concept) like an interface to a class (implementation of the interface)

Valid Expressions
The syntax accepted by the compiler
Semantics
How an expression affects the states of objects (invariants, preconditions, postconditions)

In principle, validating an expression happens at compile time. Semantics cannot be checked by the compiler and take effect at run time, (apart from some expressions that are part of the C++ language itself).

For example, semantics of a stack s can be described like this:

Expression Precondition Postcondition
s.push(v) none s.pop() == v
s.pop() !s.empty() s.size() = s.size() - 1

Or, using functional notation, like this:

\[ \begin{align} stack \rightarrow Stack \\ with\ s : Stack \\ \\ pop(push(s,v)) &= (s,v) \\ size(push(s,v)) &= size(s) + 1 \\ size(stack()) &= 0 \\ pop(stack()) &= undefined \end{align} \]

Like any formal definition, this is only useful (to humans) in combination with thoughtfully written explanations.

So, what about the well-defined property in the definition of Concept?
There are additional formalese implications of well-defined, but simply put, it means unambiguous:

For a valid expression in a well-defined concept, there is exactly one possible interpretation.

I use the non-standard (!) umbrella term Interpretation for any compile time mechanism that involves deducing or substituting types, selecting implementation variants such as function overloads, and the like.

We will encounter type theory in nearly every topic in this lab course.

Trivia: The Most Vexing Parse

Speaking of the compiler interpreting your types, there is a notoriously wicked type interpretation speshulty in C++: The most vexing parse

Originally described and named by Scott Meyers in Item 6 of his book Effective STL, this article on FluentCpp.com confronts you with a ruthless analysis of this phenomenon.

In summary, this is something we call a bloody hell of a mess.
Fortunately, you can easily eliminate it in modern C++: Use uniform initialization.

Function Overloads

Functions overload each other if they have the same name and different parameters.

In the first session, we briefly discussed why function definitions may not have different return types.
However, function overloads are well-defined: if more than one overload variant is elibible an expression, compilation fails with an ambiguous function call error. In effect, overloaded functions cannot conflict and therefore may have different return types.

A simple example:

double square(double);
int    square(int);

It is important to understand that overloads are chosen depending on parameters, not return type:

int  i_sq_i = square(3);   // Literal integers have implicit type `int`

auto a_sq_i = square(4);   // Calls variant `int square(int)`, return
                           // type is deduced as `int`
int  i_sq_d = square(3.4); // Calls variant `double square(double)`,
                           // then casts result to `int`

// Of course, explicit cast can be used to select an overload:
auto i_sq_d = square(static_cast<double>(3));
// in our example, same as:
auto i_sq_d = square(3.0);
// Deduced type of `i_sq_d` is double.

Q: Explain if/how overload deduction would work here:

class Base;
class A : public Base;
class B : public Base;

void foo(Base);
void foo(A);

foo(B()); // no overload `foo(B)` exists

Introduction to C++ Templates

Trivia: Class Templates in C

In the last assignment, the implementation of class Vector in C used int as a fixed type of the vector elements.

To add a definition of a vector for elements of type double, the vector class implementation has to be duplicated with all occurrences of int replaced by double.

In C99, the only way to avoid duplicate code is to use a macro like this:

/**
 * Usage:
 *    DECLARE_GENERIC__VECTOR(ulong, unsigned long);
 * -> declares Vector_ulong
 */
#define DECLARE_GENERIC__VECTOR(QUAL_TYPE, VECTOR_VALUE_TYPE) \
  typedef VECTOR_VALUE_TYPE \
          Vector_##QUAL_TYPE##_Value;  \
  \
  typedef Vector_##QUAL_TYPE##_Value * \
          Vector_##QUAL_TYPE##_Iterator; \
  \
  typedef struct { \
    int                          size; \
    Vector_##QUAL_TYPE##_Value * data; \
  } Vector_##QUAL_TYPE; \

// ... and so on ...

Templates in C++ in principle realize the same mechanism. For some compiler errors from templates, it helps to keep the code generation aspects in mind.

The overall procedure of template instantiation in C++ involes several stages of symbolic (rather than lexical) operations, but you can refer to the macro-based variant as a mental model.

But this is just mechanics.

Conceptually, templates are well-defined formal constructs and an essential part of the C++ type system.

Templates should be considered as functions that return types.

Class Templates in C++

In C++, the “code generation” of fully qualified types like Vector_int and Vector_float from the C examples above is a built-in compiler function that is accessible with the template syntax.

Example:

template <typename T>
class Foo
{

public:

  void add(const T & value) {
    _vector.push_back(value);
  }

  // ...

private:

  Vector<T> _vector;

};

In the macro-variant above, Vector is not a usable type, as it is not fully qualified.

It is, however, usable as a concept: all types created from the Vector template have identical methods with identical semantics defined, independent from the concrete element type.

The concrete definitions Vector_int and Vector_float are actual type names that a programmer can use in code. To the compiler, a type Vector simply does not exist.

For the same reason, Foo is not a type in the C++ example above, but Foo<int> and Foo<double> are. You can think of the symbols < and > as part of the type name just like they were “regular” characters like _.

If two classes are named Foo_string and Foo_double, you would expect that they have separate implementations.

And in fact, the implementations of Foo<int> and Foo<double> might be totally different just as well, just like functions Foo(int) and Foo(double) may have specific implementations.

Function Templates in C++

From Pointer to Iterator

In the implementation of vector in assignment 2, pointers work nicely to iterate vector elements. De-referencing a pointer to a vector element would resolve the element’s value.

In brief, vector iterators allow the following expressions that are valid for any sequential container:

vector v;

vector_iterator b = v.begin();
vector_iterator e = v.end();

int v_size = e - b;

vector_iterator v_second = b + 1;

STL Iterator Concepts

Tag Dispatching

Iterator tags have been introduced in ancient times, long before C++11 where type deduction has been introduced.

They have been introduced to allow a technique called Tag Dispatching which is based on plain old function overloading.

Data Structure vs. Container Concept

The term Data Structure refers to the (internal) arrangement of data in memory and the mechanisms that maintain this arrangement.

The term Container refers to the (public) operations and their semantics visible to the user.

Example:

  • A Tree is a Data Structure as it defines the arrangement and connections of data in memory
  • A Set is a Container that could be based on a Data Structure like a tree

But a Set could also be based on a dynamic array or a list.

Changing the underlying data structure does not affect semantics of the container. Of course, the underlying data structure typically affects the complexity of the container’s methods, but computational complexity has no effect on semantics.

This conceptual separation can be observed in the STL in many places:

  • the class template std::stack is not a container, it is a container wrapper that provides adapter methods (push and pop, obviously);
    the underlying data structure is specified as template parameter
  • the C++ standard does not specify which data structure should be used to implement STL containers, it only gives recommendations and specifies constraints on computational complexity of operations on containers

The Importance of new and delete

We discussed the definitions of Allocation, Initialization and Instantiation in the last session.

In C, you used malloc and free to allocate memory, and called an initialization function to fill the allocated memory range with meaningful values (see vector__new in the last assignment).
In C++, these function should be considered as non-existing. There is no valid reason to use them, and various reasons not to.

In the last session, we soon came to the conclusion that allocation and initialization should not be available as separate operations. To ensure that objects can only be created in a well-defined state, they should only be available in conjunction as an Instantiation method.

The new operator in C++ allocates an object of a specified type on the heap and initializes the object by calling a constructor.

Likewise, the delete operator first calls the object’s destructor before deallocating its memory.

The Importance to avoid new and delete

The new and delete operations should never appear in application code.

There are many good and widely accepted reasons for this guideline that we will discuss in upcoming sessions. For now, you only have to be aware of one reason: Whenever a resource is acquired in an application, it must be released at some point.

Now, an armada of additional, non-trivial problems arise: ownership of the resource must be well-defined (exactly one owner at any time), and the resource must not be released if its is still referenced by some part of the program, and so on.

Luckily, the alternatives to new and delete are straight-forward and conveniently apply to resource management in general, like file handles and mutex locks.

Rule of Zero and RAII

See: http://en.cppreference.com/w/cpp/language/rule_of_three

If a class follows the Rule of Three (or Rule of Five, see upcoming session), its underlying resource management is encapsulated such that instances of the class can be used like POD (plain old data):

if (condition) {
  // vector instance on stack, vector elements are internally
  // allocated on heap
  std::vector<int> v({ 100, 200, 300 });
  v.push_back(400);
  std::cout << "vector accumulate: "
            << std::accumulate(v.begin(), v.end(), 0)
            << std::endl;
  // vector instance removed from stack, destructor of std::vector
  // deallocates elements from heap
}

Purists (which you should aspire to be) demand that any class which needs to implement the Rule of Three should exclusively deal with ownership and not serve any other purpose.

The Rule of Zero states that, ideally, a class does not specify any assignment, copy constructor or destructor.

This is only possible if any resource ownership is encapsulated in members that follow Resource Acquisition Is Initialization (RAII).

In the example below, class Foo is a dynamic container and for this needs to allocate data on the heap.

As the ownership of allocated memory is encapsulated in an instance of std::vector (which itself follows the Rule of Three), no assignment, copy or destruction has to be specified.

Even the default constructor can be omitted in this case.

class Foo {
public:
  void add(int value) {
    _vector.push_back(value);
  }

private:
  std::vector<int> _vector;
};

However, I personally agree with Scott Meyer’s reasoning that this should be understood as the
Rule of Five defaults.

Safe Heap-Allocation of Single Objects

Use std::make_shared or std::make_unique

These are helpers for the smart-pointer templates std::shared_ptr and std::unique_ptr.

Not only do these employ RAII to ensure that heap-allocated memory will be released at the end of the owner’s lifetime, they also specify robust ownership semantics.

A std::unique_ptr is move-only and cannot be copied. Memory allocated using std::make_unique has exactly one owner at any time and will be released depending on the owner’s lifetime.

If the allocated memory is shared, std::shared_ptr uses reference counting to ensure that it will only be released when the last referencing object is destroyed.

Safe Heap-Allocation of Object Arrays

Use std::vector

It is worth mentioning that malloc/free and new/delete are incompatible mechanisms: memory allocated with malloc cannot be deallocated using delete and calling free on memory reserved with new yields undefined behavior.

Luckily you don’t have to care as you won’t use either of them.

But what about realloc? What is the equivalent operation in C++?

It’s simple: Use std::vector.
It provides the best optimized, all-purpose RAII-encapsulation of dynamic memory managment and it is nearly impossible to break.

For the same reason you also should avoid new/delete and use std::vector instead whenever you can.

Consider this example:

if (foo != 0) {
  std::vector<double> buf(50); // ~> malloc(50 * sizeof(double));
  buf.resize(60);              // ~> realloc(buf, 60 * sizeof(double));
  buf.push_back(12.23);        // ~> int buf_write_index = 0;
                               //    buf[buf_write_index] = 12.23;
                               //    buf_write_index++;
  throw std::runtime_error("What now?");
  // vector `buf` is removed from stack and its destructor
  // is called, no memory leak!
  // If we used an explicit `free` or `delete`, it would not
  // be reached:
                               // ~> free(buf); // not called if exception
                                                // is thrown before
}

Type Deduction, Substitution and SFINAE

Let’s recall the well-defined property of concepts:

For a valid expression in a well-defined concept, there is exactly one possible interpretation.

Again, I use the non-standard term interpretation for any mechanism at compile time that involves deducing a type, chosing an implementation variant such as function overloads, and the like.

In C++ - and this is essential - there also is the inverse rule:

There must exist exactly one interpretation that resolves to a valid expression.

Why do we need the inverse? Well, it implies that there may exist multiple interpretations of an expression, however only and exactly one of those would make the expression valid.

The others are just ignored.
Any type deduction that would result in a compiler error is just ignored like it never existed.

So there is no restriction on interpretations that would result in a compiler error, and this allows to define multiple, even seemingly conflicting interpretations of an expression.

This prime directive in modern C++ is called

SFINAE: Substitution Failure Is Not An Error

Without SFINAE, every interpretation option would be required to resolve to a valid expression.

Lecturer’s Opinionated Note

Type deduction and SFINAE are the reason why C++ is the most relevant programming language in the real world today. The distribution of the C++ 11 standard made a multitude of established research projects irrelevant (see: X10, UPC).

I hear Gloria in excelsis Deo from Beethoven’s Missa Solemnis in my head when I think about it, that’s how awesome it is.