The C++ Programming Language – Part 9

This post will be mostly for personal reference as I go through Bjarne Stroustrop’s “The C++ Programming Language” 4th edition textbook. Some of the notes will appear random.

Chapter 7 – Pointers, Arrays, and References (continued)

Why even use pointers in the first place if they are wrought with danger if used incorrectly? They allow us to pass large amounts of data at low cost; instead of copying the data we pass the address of it in memory. Like a pointer, a reference is an alias for an object, it holds the machine (memory) address of an object. How is this different from a pointer you might wonder? First, a reference can be accessed with the same syntax as the name of an object while a pointer must sometimes use different syntax. Second, a reference always refers to the object to which it was initialized while raw pointers can be changed to point elsewhere at anytime. Lastly, there is no concept of a null reference, we can always assume that a reference actually refers to an object. “Think of a reference as an alternative name for an object, an alias.”

// g++ -std=c++11 -o ptr_vs_ref ptr_vs_ref.cpp
#include <iostream>

using namespace std;

struct TestContainer {
    int x;
};

int main()
{   
    /* Pointer to int*/
    TestContainer *ptr = new TestContainer {7};

    /* Create a reference to that int */
    TestContainer& ref {*ptr};

    /* Accessing data via pointer uses different syntax than the struct object itself */
    cout << "using the pointer: "<< ptr->x << endl;

    /* Accessing data via reference uses the same syntax as the struct object itself. */
    cout << "using the reference: "<< ref.x << endl;

    TestContainer tc {5};
    /* Accessing struct data directly, same as using a reference */
    cout << "direct access: " << tc.x << endl;

    return 0;
}

Output:

user@ubuntu:~/cpp/part_2/chapter_7$ ./ptr_vs_ref 
using the pointer: 7
using the reference: 7
direct access: 5

The first version of Fortran used pass-by-reference, its a time-tested programming tradition. References must be initialized during declaration. Operators do not operate on references, they operate on the referred object. For example:

// g++ -std=c++11 -o ref_ops ref_ops.cpp
#include <iostream>

using namespace std;

int main()
{
    /* int on the stack */
    int x = 0;

    /* x_ref now refers to x on the stack */
    int& x_ref {x};

    cout << x_ref << endl;

    /* operator does not operate on the reference itself */
    ++x_ref;

    cout << x_ref << endl;

    /* get a pointer to the object referred to by x_ref 
     * this gives us the "address of x" on the stack */
    int *x_ptr = &x_ref;

    cout << *x_ptr << endl;

    return 0;
}

Output:

user@ubuntu:~/cpp/part_2/chapter_7$ ./ref_ops 
0
1
1

One more time: a reference is a constant pointer that is de-referenced each time it is used. A reference is not an object that can be operated on like a pointer. Reference types support different uses of objects. First type: non-const lvalue reference to an object that can be modified. An example of this would be passing an array by reference to a function, having that function modify the array, then return status of the operation. This way, the array is never copied or moved, but can still be operated on by some function. Second type: const lvalue reference to a constant that is immutable from the point-of-view of the reference user. Third-type: rvalue temporary object reference that can be used once and basically disregarded after because it is no longer valid. The last kind is definitely hard to understand, and I don’t pretend to. It was introduced in C++11 and you can read more about it here.

Some cherrypicked concluding advice from Bjarne: keep the use of pointers straightforward, avoid multi-dimensional arrays; define containers instead, use nullptr instead of NULL, use C++ string in favor of null terminated char arrays, prefer const references to plain reference arguments, generally avoid void*, and lastly prefer references to pointers as arguments except when a function can take “no object.”

Chapter 8 – Structures, Unions, and Enumerations

The whole purpose of using C++ is the definition and efficient usage of user-defined types. The 3 most simple methods to create a user-defined type are to use a struct, a union, or an enum. A struct is a sequence of elements of arbitrary type. The struct elements are typically referred to as members, and as such a struct is essentially a simple form of a class. A union is a struct that holds the value of just one of its members at any time. An enum is a type with a set of named constants. The named constants are called enumerators. A popular variant of the enum is the enum class. Also known as a scoped enumeration, its an enum where the enumerators are within the scope of the enumeration, rather than a larger namespace scope.

Part 1 – Struct

#include <iostream>


using namespace std;


/* User-defined type describing a blog post. */
struct BlogPost {

    /* post name */
    const char* name;

    /* post author */
    const char* author;

    /* post tags i.e. "development" is "d" */
    char tags[13];

    /* word count */
    int length;
};


int main()
{
    BlogPost bp = {
        "badbytes.io",
        "realbadbytes",
        {'d', 'c', 'n'},
        542
    };

    cout << "BlogPost bp size according to sizeof: " << sizeof(bp) << endl;

    return 0;
}

Output:

user@ubuntu:~/cpp/part_2/chapter_8$ ./struct_sample1 
BlogPost bp size according to sizeof: 40

Struct members are layed out in memory in the order they are declared. This way, holding a pointer to the start of the struct, you can efficiently access members by size offset. You’ll commonly see this in assembly language where a register holds the struct pointer: dword ptr [rax+0x8]. This is not to say that the sum of all struct member sizes equals the struct size in memory. The exact struct size in memory is architecture dependent and may vary based on alignment requirements. Using the __packed__ keyword can ask the compiler to place all struct members consecutively in memory. Let’s see if my Ubuntu 16.04 g++ (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010 compiler supports struct packing:

#include <iostream>


using namespace std;


/* User-defined type describing a blog post. */
struct BlogPost {

    /* post name */
    const char* name;

    /* post author */
    const char* author;

    /* post tags i.e. "development" is "d" */
    char tags[13];
    
    /* word count */
    int length;
} __attribute__ ((packed));


int main()
{
    BlogPost bp = {
        "badbytes.io",
        "realbadbytes",
        {'d', 'c', 'n'},
        542
    };

    cout << "BlogPost bp size according to sizeof: " << sizeof(bp) << endl;

    return 0;
}

Output:

user@ubuntu:~/cpp/part_2/chapter_8$ ./struct_sample2 
BlogPost bp size according to sizeof: 33

Doing this sort of thing can have unintended consequences. On the one hand, you’ve saved a bit of memory. This can be a great achievement on embedded devices where space is limited. However, for every gain there is a loss; tradeoffs are a fact of life. This code would likely be far less efficient on a large scale due to misaligned struct members. It may even fail to run on some processors. To quote a stackoverflow discussion on misalignment:

The behaviour of unaligned access, no matter how you view it, is undefined or at best implementation defined. This means that the result from such an operation ranges from "it works just like you expect it to" at one end of the spectrum to "the program crashes". In the middle of that range are perhaps the worse options of "it doesn't crash, but it also doesn't quite behave as you expect" - for example, you may find that you get the value corresponding to the aligned address just before [lower memory address] the data you expected to fetch, or the fetch is indeed performed correctly, but it takes 10 or 100 times longer than the aligned variant, due to the processor trapping and performing the operation in several steps, then returning from the trap. If you are also running multiple threads, you may also find that updates to the variable isn't done in an atomic way, so you get values that are "half of one, half of another", which can clearly lead to very strange effects. This is NOT a conclusive list of the potential scenarios of "things going a bit wrong, but not in an immediately obvious way".

In conclusion, here a nice example of a struct in NetBSD’s ip.h, real world stuff:

/*
 * Structure of an internet header, naked of options.
 */
struct ip {
#if BYTE_ORDER == LITTLE_ENDIAN
    unsigned int ip_hl:4,       /* header length */
             ip_v:4;        /* version */
#endif
#if BYTE_ORDER == BIG_ENDIAN
    unsigned int ip_v:4,        /* version */
             ip_hl:4;       /* header length */
#endif
    u_int8_t  ip_tos;       /* type of service */
    u_int16_t ip_len;       /* total length */
    u_int16_t ip_id;        /* identification */
    u_int16_t ip_off;       /* fragment offset field */
    u_int8_t  ip_ttl;       /* time to live */
    u_int8_t  ip_p;         /* protocol */
    u_int16_t ip_sum;       /* checksum */
    struct    in_addr ip_src, ip_dst; /* source and dest address */
} __packed;