The C++ Programming Language – Part 3
This post will be mostly for personal reference as I go through Bjarne Stroustrop’s “The C++ Programming Language” 4th edition textbook. Some of the notes will appear random. This post will cover mostly the vector container. More containers and some algorithms will be featured in part 4.
Chapter 4 – Containers and Algorithms
The C++ standard library provides standard types and takes up 2/3rds of the ISO C++ standard. It provides facilities for runtime support like allocations, the C standard library, strings, io streams, containers, algorithms, numerical computation, regex matching, concurrency, template programming, smart pointers, and more. In short, the C++ stdlib contains common fundamental data structures along with algorithms used on said data structures. The primary purpose of the C++ stdlib is to provide you with a well-designed and well-test tool set to solve problems in an efficient way. Each tool in the tool set utilizes C++ language features in near optimal/optimal ways.
Strings in C++ are mutable, as opposed to other languages like Python. You can utilize the [] operator to change a C++ String.
/* g++ -std=c++11 -o string string.cpp */ #include <stdio.h> #include <iostream> #include <string> using namespace std; string compose(const string& name, const string& domain) { return name + '@' + domain; } int main() { auto addr = compose("yee", "badbytes.io"); /* printf doesnt know what a C++ string is, so must pass C string */ printf("%s\n", addr.c_str()); /* cout ostream knows how to handle C++ string type */ cout << addr[2] << endl; }
Output:
user@ubuntu:~/cpp/part_1/chapter_4$ ./string yee@badbytes.io e
Input/output (io) of built-in types is straight forward. cout and cin from the iostream library can intake all built-in types. iostream also allows developers to define IO operations for their own user-defined types. For example:
/* g++ -std=c++11 -o entry entry.cpp */ #include <iostream> #include <string> using namespace std; struct Entry { string name; int number; }; /* user-defined type output */ ostream& operator<< (ostream& os, const Entry& e) { return os << "{\"" << e.name << "\"," << e.number << "}" << endl; } /* user-defined type input */ istream& operator>> (istream& is, Entry& e) { char c; char c2; if (is >> c && c == '{' && is >> c2 && c2 == '"') { string name; while (is.get(c) && c != '"') { name += c; } if (is >> c && c == ',') { int number = 0; if (is >> number >> c && c == '}') { e = {name,number}; return is; } } } is.setstate(ios_base::failbit); return is; } int main() { /* for (Entry ee; cin >> ee;) { cout << ee << endl; } */ Entry ee; cin >> ee; cout << ee; return 0; }
Output:
user@ubuntu:~/cpp/part_1/chapter_4$ ./entry {"badbytes", 1} {"badbytes",1}
A container is a class with the main purpose of holding objects. The entry class shown earlier is a container. Vector is the most useful standard library container. It holds a sequence of elements of a given type and provides efficient ways to operate on the held objects. Vector stores elements contiguously in memory, giving it O(1) access time since you hold a starting pointer, then access elements by offset. Iterators are provided for every standard library container. They expose begin() and end() functions. Begin returns a pointer to the first element, end() points to one-past-the-end of the container. This can be a source of bugs, so be mindful when using end() while using a container iterator. Container iterators can also use unary operators like ++ and — to traverse the objects. While looping through a container with an iterator named iter, *iter is the current container element. If this pointer de-reference returns an object with members, then iter->member is valid. Using this syntax semantically the same as using (*iter)->member. Example vector initialization and usage:
/* g++ -std=c++11 -o vector_init vector_init.cpp */ #include <vector> #include <iostream> using namespace std; int main() { vector<int> v1 = {1, 2, 3, 4}; vector<string> v2; vector<char> v3(23); vector<double> v4(8, 9.9); cout << "v1 size: " << v1.size() << endl; cout << "v1[2]: " << v1[2] << endl; cout << "v2 size: " << v2.size() << endl; cout << "v3 size: " << v3.size() << endl; cout << "v3 vector contents: " << endl; for (int i = 0; i != v3.size(); ++i) { cout << v3[i]; } cout << "v4 size: " << v4.size() << endl; cout << "v4 vector contents: " << endl; for (int i = 0; i != v4.size(); ++i) { cout << v4[i] << " "; } cout << endl; return 0; }
Output:
user@ubuntu:~/cpp/part_1/chapter_4$ ./vector_init v1 size: 4 v1[2]: 3 v2 size: 0 v3 size: 23 v3 vector contents: v4 size: 8 v4 vector contents: 9.9 9.9 9.9 9.9 9.9 9.9 9.9 9.9
A second example using the vector container:
/* g++ -std=c++11 -ggdb -o vector_usage vector_usage.cpp */ #include <vector> #include <string> #include <iostream> using namespace std; void print_names(const vector<string>& names) { for (int i = 0; i < names.size(); ++i) { cout << names[i] << endl; } } int main() { vector<string> names = {"bad", "bytes", "yee"}; print_names(names); return 0; }
Output:
user@ubuntu:~/cpp/part_1/chapter_4$ ./vector_usage bad bytes yee
I stated earlier that stdlib vector stores elements contiguously in memory, giving it O(1) access time since you hold a starting pointer, then access elements by offset. Lets head into a debugger and make sure that the vector class is actually storing elements contiguously. If you want to replicate this, I suggest adding set print asm-demangle on in .gdbinit. This makes gdb show cleaner function names. For example,
<_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev@plt>
becomes
<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@plt>.
Which lets be honest, still looks confusing, but its better. Disassembling main()
pwndbg> disas main ... 0x0000000000400e9c <+41>: call 0x400cb0 <std::allocator<char>::allocator()@plt> 0x0000000000400ea1 <+46>: lea rdx,[rbp-0xb4] 0x0000000000400ea8 <+53>: lea rax,[rbp-0x90] 0x0000000000400eaf <+60>: mov esi,0x401945 0x0000000000400eb4 <+65>: mov rdi,rax 0x0000000000400eb7 <+68>: call 0x400ca0 <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)@plt> 0x0000000000400ebc <+73>: lea rax,[rbp-0xb3] 0x0000000000400ec3 <+80>: mov rdi,rax 0x0000000000400ec6 <+83>: call 0x400cb0 <std::allocator<char>::allocator()@plt> 0x0000000000400ecb <+88>: lea rax,[rbp-0xb3] 0x0000000000400ed2 <+95>: lea rdx,[rbp-0x90] 0x0000000000400ed9 <+102>: lea rcx,[rdx+0x20] 0x0000000000400edd <+106>: mov rdx,rax 0x0000000000400ee0 <+109>: mov esi,0x401949 0x0000000000400ee5 <+114>: mov rdi,rcx 0x0000000000400ee8 <+117>: call 0x400ca0 <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)@plt> 0x0000000000400eed <+122>: lea rax,[rbp-0xb2] 0x0000000000400ef4 <+129>: mov rdi,rax 0x0000000000400ef7 <+132>: call 0x400cb0 <std::allocator<char>::allocator()@plt> 0x0000000000400efc <+137>: lea rax,[rbp-0xb2] 0x0000000000400f03 <+144>: lea rdx,[rbp-0x90] 0x0000000000400f0a <+151>: lea rcx,[rdx+0x40] 0x0000000000400f0e <+155>: mov rdx,rax 0x0000000000400f11 <+158>: mov esi,0x40194f 0x0000000000400f16 <+163>: mov rdi,rcx 0x0000000000400f19 <+166>: call 0x400ca0 <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)@plt> 0x0000000000400f1e <+171>: lea rax,[rbp-0x90] 0x0000000000400f25 <+178>: mov r12,rax 0x0000000000400f28 <+181>: mov r13d,0x3 0x0000000000400f2e <+187>: lea rax,[rbp-0xb1] 0x0000000000400f35 <+194>: mov rdi,rax 0x0000000000400f38 <+197>: call 0x401174 <std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::allocator()> 0x0000000000400f3d <+202>: lea rdx,[rbp-0xb1] pwndbg> x/1s 0x401945 0x401945: "bad" pwndbg> x/1s 0x401949 0x401949: "bytes" pwndbg> x/1s 0x40194f 0x40194f: "yee"
It looks like the 3 strings that I put into the vector container are initially in .rodata based on objdump.
user@ubuntu:~/cpp/part_1/chapter_4$ objdump -h vector_usage ... 15 .rodata 00000013 0000000000401940 0000000000401940 00001940 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
All 3 strings are copied from read-only data onto main’s stack frame.
pwndbg> x/1s 0x7fffffffdc00 0x7fffffffdc00: "bad" pwndbg> x/1s 0x7fffffffdc20 0x7fffffffdc20: "bytes" pwndbg> x/1s 0x7fffffffdc40 0x7fffffffdc40: "yee" pwndbg> info proc mapping ... 0x604000 0x636000 0x32000 0x0 [heap] ... 0x7ffffffde000 0x7ffffffff000 0x21000 0x0 [stack]
I expected these strings to be placed on the heap by the container, but in this example they are on the stack. Next, the values in the iterator are looped over and printed, then the container is destructed. This confirms that the elements are actually stored consecutively allowing for O(1) access based on offset, but this use case is very simple and has tiny strings.
pwndbg> x/40s 0x7fffffffdc00 0x7fffffffdc00: "bad" 0x7fffffffdc04: "" 0x7fffffffdc05: "" 0x7fffffffdc06: "" 0x7fffffffdc07: "" 0x7fffffffdc08: "0\334\377\377\377\177" 0x7fffffffdc0f: "" 0x7fffffffdc10: " \334\377\377\377\177" 0x7fffffffdc17: "" 0x7fffffffdc18: "\005" 0x7fffffffdc1a: "" 0x7fffffffdc1b: "" 0x7fffffffdc1c: "" 0x7fffffffdc1d: "" 0x7fffffffdc1e: "" 0x7fffffffdc1f: "" 0x7fffffffdc20: "bytes" 0x7fffffffdc26: "" 0x7fffffffdc27: "" 0x7fffffffdc28: "\377\377" 0x7fffffffdc2b: "" 0x7fffffffdc2c: "\001" 0x7fffffffdc2e: "" 0x7fffffffdc2f: "" 0x7fffffffdc30: "@\334\377\377\377\177" 0x7fffffffdc37: "" 0x7fffffffdc38: "\003" 0x7fffffffdc3a: "" 0x7fffffffdc3b: "" 0x7fffffffdc3c: "" 0x7fffffffdc3d: "" 0x7fffffffdc3e: "" 0x7fffffffdc3f: "" 0x7fffffffdc40: "yee" 0x7fffffffdc44: "" 0x7fffffffdc45: "" 0x7fffffffdc46: "" 0x7fffffffdc47: "" 0x7fffffffdc48: "\r\031@"
Let’s see what happens with strings 100,000 chars long:
/* g++ -std=c++11 -ggdb -o vector_usage vector_usage.cpp */ #include <vector> #include <string> #include <iostream> using namespace std; int main() { string a(100000, 'A'); string b(100000, 'B'); string c(100000, 'C'); vector<string> names = {a, b, c}; return 0; }
In this case, heap space is allocated by the container.
pwndbg> 0x0000000000400c87 12 string a(100000, 'A'); LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA ───────────────────────────────────────────────────────────────────────────────[ REGISTERS ]─────────────────────────────────────────────────────────────────────────────── RAX 0x615c20 ◂— 0x4141414141414141 ('AAAAAAAA') ... pwndbg> malloc_chunk a 0x615c20 PREV_INUSE { prev_size = 4702111234474983745, size = 4702111234474983745, fd = 0x4141414141414141, bk = 0x4141414141414141, fd_nextsize = 0x4141414141414141, bk_nextsize = 0x4141414141414141 }
The same procedure occurs for variables b and c. Now to make sure they are contiguous:
pwndbg> malloc_chunk a 0x615c20 PREV_INUSE { prev_size = 4702111234474983745, size = 4702111234474983745, fd = 0x4141414141414141, bk = 0x4141414141414141, fd_nextsize = 0x4141414141414141, bk_nextsize = 0x4141414141414141 } pwndbg> malloc_chunk b 0x62e2d0 IS_MMAPED { prev_size = 4774451407313060418, size = 4774451407313060418, fd = 0x4242424242424242, bk = 0x4242424242424242, fd_nextsize = 0x4242424242424242, bk_nextsize = 0x4242424242424242 } pwndbg> malloc_chunk c 0x646980 PREV_INUSE IS_MMAPED { prev_size = 4846791580151137091, size = 4846791580151137091, fd = 0x4343434343434343, bk = 0x4343434343434343, fd_nextsize = 0x4343434343434343, bk_nextsize = 0x4343434343434343 } ... user@ubuntu:~/cpp/part_1/chapter_4$ python >>> hex(0x615c20 + 100000) '0x62e2c0' >>> hex(0x62e2d0 + 100000) '0x646970' ... pwndbg> x/100xg 0x62e2c0 0x62e2c0: 0x0000000000000000 0x00000000000186b1 0x62e2d0: 0x4242424242424242 0x4242424242424242 0x62e2e0: 0x4242424242424242 0x4242424242424242 0x62e2f0: 0x4242424242424242 0x4242424242424242 0x62e300: 0x4242424242424242 0x4242424242424242 pwndbg> x/100xg 0x646970 0x646970: 0x0000000000000000 0x00000000000186b1 0x646980: 0x4343434343434343 0x4343434343434343 0x646990: 0x4343434343434343 0x4343434343434343 0x6469a0: 0x4343434343434343 0x4343434343434343 0x6469b0: 0x4343434343434343 0x4343434343434343 0x6469c0: 0x4343434343434343 0x4343434343434343
And there they are, nice and contiguous in memory. A good next experiment may be to create wildly different sized vector members and see how it handles that. Since the heap tries to separate allocations based on size, I wonder how vector maintains contiguous members. It probably requests a block large enough to hold all its members regardless of their comparable size, so that a member of size 100 can still be contiguous to a member of size 100000. Otherwise, the OS memory allocator will most likely place those in different heap bins. I took a look at the vector source code but its too complex for the time I have. If you want to read it, see stl_vector.h. This was a bit of an aside, stay tuned for part 4.