TCP/IP Protocols – Part 1
This series will be mostly for personal reference as I go through W. Richard Stevens’ “TCP/IP Illustrated- Volume 1 – The Protocols” textbook. Some of the notes will appear random. I will most likely skip over large portions that are either very simple or uninteresting.
Seeing how protocols operate in varying circumstances provides a greater understanding of how they work and why certain design decisions were made. This book will cover Ping, Telnet, Rlogin, FTP, SMTP, X, Traceroute, DNS, TFTP, BOOTP, SNMP, NFS, and RPC. The majority of those protocols are implemented via TCP and/or UDP on top of IP. The only exception is Ping, which uses ICMP and does not use TCP or UDP. This is common trick interview question: “Which layer of the OSI model does ICMP reside?” Many will say Transport, but really ICMP is part of the Network layer. ICMP and IGMP messages are encapsulated in IP datagrams. A similar trick question could include ARP and RARP which are actually part of the Link layer, below IP.
Chapter 1 – Introduction
The TCP/IP protocol suite has far exceeded its original estimates. Initially it started as a government funded research project. There are 4 major layers to the suite: Link, Network, Transport, and Application. The OSI “Open Systems Interconnection” model further expands these layers to offer more granularity. Link layer contains the device driver and network interface card. Network layer handles routing. Transport provides data flow control, both reliable and unreliable. Application layer handles details of the particular application being used. The layers below application are the supporting framework; they are application-agnostic. The unit name used in networking is the octet. While today the octet is 1 byte in size (8 bits), that was not always the case. The early development for TCP/IP was done on a DEC-10 machine (aka PDP-10), which didn’t use 8-bit bytes.
Encapsulation
A physical property of an Ethernet frame is that the size of its data must be between 46 and 1500 bytes. Some internet routers allow jumbo frames, but not all. Sending a jumbo frame to a incapable router could result in fragmentation, or the packet may just be dropped (need to confirm).
RFC
RFCs “Request For Comment” are the official standards of the internet community. They are living design documents. The Assigned Numbers RFC specifies all the magic numbers and constants that are used in internet protocols. The Router Requirements RFC specifies the unique requirements of routers. There are some interesting sections in that particular RFC, for example the “robustness principle.” This particular RFC was last updated in 1995.
1.3.2 Robustness Principle At every layer of the protocols, there is a general rule (from [TRANS:2] by Jon Postel) whose application can lead to enormous benefits in robustness and interoperability: Be conservative in what you do, be liberal in what you accept from others. Software should be written to deal with every conceivable error, no matter how unlikely. Eventually a packet will come in with that particular combination of errors and attributes, and unless the software is prepared, chaos can ensue. It is best to assume that the network is filled with malevolent entities that will send packets designed to have the worst possible effect. This assumption will lead to suitably protective design. The most serious problems in the Internet have been caused by unforeseen mechanisms triggered by low probability events; mere human malice would never have taken so devious a course! Adaptability to change must be designed into all levels of router software. As a simple example, consider a protocol specification that contains an enumeration of values for a particular header field - e.g., a type field, a port number, or an error code; this enumeration must be assumed to be incomplete. If the protocol specification defines four possible error codes, the software must not break when a fifth code is defined. An undefined code might be logged, but it must not cause a failure.
Another interesting RFC to check out is RFC 1000, the “Request For Comments Reference Guide” which is designed to provide a historical account by categorizing and summarizing of the Request for Comments numbers 1 through 999 issued between the years 1969-1987. See it here.
Chapter 2 – Link Layer
The link layer uses 48-bit hardware addresses as opposed to the IPv4 layer’s usage of 32 bit addresses. PPP (Point-To-Point Protocol) is still used today, it fixes the shortcomings of the serial protocol SLIP (Serial Line IP). Each frame begins and ends with a flag byte 0x7e. That byte is followed by an address byte of 0xff, then a control byte 0x03. Most of these network protocols are just tag-length-value types describing the size of chunks, and/or they have defined unchanging offsets for protocol fields. It simple and fast to parse the bytes on the wire due to this design. Thinking about it, I’m not sure it could have been designed any other way.
MTU
The maximum transmission unit “MTU” limits the number of bytes that can be in a single Ethernet frame. This number is 1500 bytes. IP will fragment packets larger than this number. Note: This book is old as yesterday, IP today probably handles jumbo frames differently. Utilize netstat to see the MTU of a specific interface:
user@ubuntu:~/tcpip_journey$ netstat -in Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg ens33 1500 0 318426 0 0 0 67091 0 0 0 BMRU lo 65536 0 14785 0 0 0 14785 0 0 0 LRU
You can see here that my ethernet interface has MTU of 1500 bytes, while the loopback allows 65536 bytes. This is probably because loopback just memory-maps the “sent” data and passes a pointer around (just a guess).
Chapter 3 – IP Routing
The options field in an IP datagram is a variable length list of optional information. The options defined in 1995 were:
security and handling restrictions for military applications record route - each traversed router would record its IP address into the datagram timestamp - each router records its IP and timestamp into the datagram loose source routing - specify a list of IP addresses that must be traversed by the datagram strict source routing - specify a list of IP address that the datagram can traverse. all other addresses are not allowed.
Record Route looks very interesting to me. Some researchers at Princeton did exploration of the RR option in 2017 with actual results. They found that a solid percentage of routers will acknowledge your Record Route request. Note that Record Route has a strict 9 hop limit. The paper’s abstract:
The IPv4 Record Route (RR) Option instructs routers to record their IP addresses in a packet. RR is subject to a nine hop limit and, traditionally, inconsistent support from routers. Recent changes in interdomain connectivity—the so-called “flattening Internet”—and new best practices for how routers should handle RR packets suggest that now is a good time to reassess the potential of the RR Option. We quantify the current utility of RR by issuing RR measurements from PlanetLab and M-Lab to every advertised BGP prefix. We find that 75% of addresses that respond to ping without RR also respond to ping with RR, and 66% of these RR-responsive addresses are within the nine hop limit of at least one vantage point. These numbers suggest the RR Option is a useful measurement primitive on today’s Internet.
Their results:
Finally, to test this yourself, check out the man page for iputils ping, then use the -R switch to enable record-route.
-R ping only. Record route. Includes the RECORD_ROUTE option in the ECHO_REQUEST packet and displays the route buffer on returned packets. Note that the IP header is only large enough for nine such routes. Many hosts ignore or discard this option.
My LAN router happened to acknowledge the flag with the following results. I’m performing the scan from a VM:
user@ubuntu:~/tcpip_journey$ ping -R 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(124) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=24.7 ms RR: 192.168.1.198 192.168.1.1 192.168.1.1 192.168.1.198 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.594 ms (same route) 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.547 ms (same route)