The C++ Programming Language – Part 4

This post will be mostly for personal reference as I go through Bjarne Stroustrop’s “The C++ Programming Language” 4th edition textbook. Some of the notes will appear random.

Chapter 4 – Containers and Algorithms (continued)

A standard library map is a search tree data structure. It is a series of key, value pairs implemented as a balanced binary tree. This is also known as a dictionary, optimized for lookup providing O(logn) lookup performance. This fast lookup is possible because when searching a balanced binary tree, the search space is halved each time a node is traversed. This continued halving is essentially the reverse of exponential growth, which is the logarithm. The unordered map is a hashed lookup version of a map. A hash table maps keys to values using a hashing function, allowing the search space to be narrowed down very quickly even compared to a binary tree. “C++ standard library hashed containers are referred to as “unordered” because they don’t require an ordering function.” This graph from MICHAEL KAZAKOV’S QUIET CORNER shows the performance difference:

There are many useful containers in the standard library. The unordered ones are designed for fast key lookup.

vector<T> - variable sized vector
list<T> - doubly linked list
forward_list<T> - singly linked list
deque<T> - double ended queue
queue<T> - first in first out data structure
priority_queue<T> - queue in which the first element is always the greatest according to some criterion. this is implemented on top of another container
stack<T> - last in first out data structure
set<T> - stores unique elements in a specific order. elements are immutable
multiset<T> - set allowing recurring values
map<K,V> - associative array (key values aka dictionary)
multimap<K,V> - same as map with added ability to have one key with multiple values
unordered_map<K,V> - hashed lookup map vs binary tree lookup
unordered_multimap<K,V> - hashed multimap
unordered_set<T> - hashed set
unordered_multiset<T> - hashed multiset
bitset<N> - stores bits...
array<T, N> - fixed size sequence in strict linear sequence

Reinforcing the concept of iterators, we can use them to write to stdout! Constructing an ostream_iterator with the ostream_type reference as cout, whenever an assignment operator is used on the ostream_iterator, it inserts a new element into the stream. Of course, the stream in this case is stdout:

#include <iostream>
#include <iterator>

using namespace std;

int main()
{
    ostream_iterator<string> oo {cout};
    *oo = "Hello, ";
    ++oo; 
    *oo = "world.\n";
}

Another way to use iterators for input and output (thanks to a buddy for this code segment):

#include <iostream>
#include <algorithm>
#include <vector>
#include <iterator>

using namespace std;

int main()
{
    istream_iterator<int> ii {cin};
    ostream_iterator<int> oo {cout};

    /* Write a vector to cout via iterator */
    vector<int> v {1, 3, 5, 7, 9};
    copy(v.begin(), v.end(), oo);

    /* Read 5 elements into the vector via iterator */
    copy_n(ii, 5, back_inserter(v));

    /* Write it back out to cout */
    copy(v.begin(), v.end(), oo);
}

Output: (you must provide input before it prints anything):

user@ubuntu:~/cpp/part_1/chapter_4$ ./ostream_iterator 
1
135792
3
4
5
1357912345

Chapter 5 – A Tour of C++ Concurrency and Utilities

The standard library aims to serve the intersection of needs rather than the union. The result of this decision is a generally useful set of well-designed tools. C++ resource management covers the allocation and deallocation of things that are acquired at runtime. In the standard library, resources follow the “Resource Acquisition Is Initialization” also called RAII. Defined so eloquently on cppreference.com, RAII “binds the lifecycle of a resource that must be acquired before use to the lifetime of the object.” This is fundamental to the design of C++. It guarantees that having access to an object means that the necessary associated resources are also ready to be used. Additionally, when the object is released (either manually or by some other mechanism like going out of function scope), all associated resources are also released. Objects, in this sense, come in an “all or nothing” package.

unique_ptr and shared_ptr are smart pointers that help manage objects on the heap. When the unique or shared pointer is forgotten about by a careless programmer, C++ covers his butt and frees all the necessary resources before going out of scope. This is essentially programming training wheels. Unique_ptr can also be returned from functions since its a handle to an individual object. Unique_ptrs are moved, they are atomic. This is in contrast to shared_ptrs, which are copied. The object pointed to by multiple shared_ptrs will only be deallocated once the reference count hits zero. For example:

// g++ -std=c++11 -o cpp_ptrs cpp_ptrs.cpp

#include <iostream>
#include <memory> // unique_ptr and shared_ptr

using namespace std;

int main(int argc, char **argv)
{
    shared_ptr<float> sample_a (new float(5.4551));

    cout << "shared_ptr sample_a ref count " << sample_a.use_count() << endl;

    shared_ptr<float> sample_b(sample_a);

    cout << "shared_ptr sample_a ref count " << sample_a.use_count() << endl;
    cout << "shared_ptr sample_b ref count " << sample_b.use_count() << endl;
    // The pointers now share the same raw pointer.
    // Thus each has ref count of 2 since they point to same place.

    // Delete managed object sample_b.
    sample_b.reset();
    // The raw pointer will not yet be deallocated, since there is still a ref to it by sample_a
    cout << "shared_ptr sample_a ref count " << sample_a.use_count() << endl;
    cout << "shared_ptr sample_b ref count " << sample_b.use_count() << endl;
    // Delete managed object sample_a.
    sample_a.reset();

    // All memory is freed since ref count is zero.
    cout << "shared_ptr sample_a ref count " << sample_a.use_count() << endl;
    cout << "shared_ptr sample_b ref count " << sample_b.use_count() << endl;

    return 0;
}

Output:

user@ubuntu:~/cpp/part_1/chapter_5$ ./cpp_ptrs 
shared_ptr sample_a ref count 1
shared_ptr sample_a ref count 2
shared_ptr sample_b ref count 2
shared_ptr sample_a ref count 1
shared_ptr sample_b ref count 0
shared_ptr sample_a ref count 0
shared_ptr sample_b ref count 0

Smart pointers are a form of destructor-based garbage collection and relieves the programmer from many memory leaks.

Concurrency is the execution of several tasks simultaneously. This is a bit of a fallacy on uniprocessor systems (and in Python, due to the GIL aka Global Interpreter Lock). Instructions are interleaved on single processors, not actually executed in parallel. Multiprocessor systems are a different game and do offer true concurrency. If those concurrently executing threads operate on the same data, beware. The C++ standard library directly supports concurrent execution in multiple threads in a single address space. These threads have their own stack, but share a heap. Threads are the most commonly used concurrency tool. For example:

#include <thread>
#include <iostream>

using namespace std;

void func(int x)
{
    for (int i = 0; i < 50; i++) {
        // There is a bad mistake here. 
        // Access to the stdout ostream is not controlled.
        // Output will be erratic.
        cout << "Thread " << x << endl;
        // Take a thread nap.
        this_thread::sleep_for(chrono::milliseconds(50));
    }
}

int main(int argc, char *argv[])
{
    // Create two threads.
    // Thread constructor takes the callback function, followed by args to that function.
    thread t1( func, 1 );
    thread t2( func, 2 );
    thread t3( func, 3 );
    thread t4( func, 4 );

    // Wait for threads to finish execution before leaving the main() thread.
    t1.join();
    t2.join();
    t3.join();
    t4.join();

    return 0;
}

Output:

...
Thread 2
Thread 3
Thread Thread 4
Thread 1
2
Thread 3
Thread 1
Thread 4
Thread 3
Thread 2
Thread 4
...

Can you spot the bug? This code contains a synchronization issue with the cout ostream. Nugget fact: “cout” stands for “console out”. In this case, if the goal is to write cleanly to console, each thread should globally lock on the output stream before writing.