I'll try to get down some of the main points of TCP/IP here.
Transmission
Control
Protocol /
Internet
Protocol is a stack of protocols defined for networking and exchange of data between computers. While the suite isn't limited to TCP, it's so common that that's how it's known.
The reason I referred to the suite as a stack is because they have a layered structure with each layer looking after certain aspects of data transfer. The entire stack can be broken down to:
- Link Layer (Hardware / Ethernet)
Protocols Include:
- Network Layer (The Invisible layer)
Protocols Include:
- Transport Layer
Protocols Include:
- Application Layer (The Visible Layer)
Protocols Include:
- Actual running applications. FTP client, browser etc
- Physical Layer (Not part of TCP/IP)
Protocols Include:
The stack has an almost cyclic attribute about it. Data travels from the Link Layer to the Physical Layer at the source and gets picked up at the Link Layer at the destination. Once there, each layer can hand off to the next.
There is obviously a lot more going on here and to understand the next point we need to delve into issues than can occur during data transfer using this model. The main problem is that the following two scenario's are likely to happen.
- Data Corruption
- Data Loss
Data Corruption
So data corruption is where the data arrived at the destination but it isn't what we sent. This can happen for a number of reasons but as an example, lets say the telephone line the message is sent from isn't very well shielded from interference.
Data Loss
Our data didn't even arrive - we don't need to know where it went, just that we didn't get it.
Overcoming these issues
So the solutions to these problems is actually quite easy. For the first issue of corruption, a simple checksum can be used to validate the content. For the second issue of data loss, sequencing numbers are used to order the packets. This makes it easy to see when we are missing data.
Checksum
So what is the checksum? This is (typically) a 16 bit value of the sum of all the octets in the datagram.
Sequencing
Because both of our issues are common, data is split into small packets and each is sent individually. This allows for a lot of flexibility in our error handling. These packets of data are not guaranteed to arrive in the order they were sent. To assist with realising if piece 4 has arrived after piece 5, a sequence number is used for each packet. This allows the receiver to order their packets in the order they were intended to be.
There is another way of detecting data corruption:
Handshaking
Handshaking is a series of messages that are sent back and forth. This takes the form of a
SYN,
SYN-ACK and
ACK messages. Which are synchronise, synchronise acknowledgement and acknowledgement respecting. To put this down to basics, message 2 won't send until we get a message 1 acknowledgement. If the acknowledgement isn't received then a time-out occurs and message 1 would be resent.
So, what if message 1 was received but it was corrupted and didn't pass the checksum? The integrity would still be maintained if that packet was just discarded and the time-out waited for, but that is terribly inefficient. Instead a
NACK message is sent instead of an
ACK message.
An ACK message of 1000 would mean that all data up to octet 1000 had been received thus far.
So let's take an example of sending an email and use the knowledge we've just learned. Email's are most commonly sent over a protocol known as SMTP or
Simple
Mail
Transfer
Protocol. This protocol lives on the application layer. Say we have a SMTP server sitting somewhere that we need to pass our email on to. For the application to send this message it needs to pass its data the TCP protocol which belongs to the transport layer. This then needs to pass over to the IP protocol on the network layer which holds the checksum and sequencing features.
Three-way handshake
As part of the three way handshake, certain information is gained which gathers the information required to transmit with a minimum of problems. Information about the IP address, port number and datagram sizes are all collected at this point. Once the maximum datagram size is known the transmission can be split up. Each datagram has its own TCP header and includes the source and destination port, sequence number and checksum. TCP itself supports full duplexing, which means that data can go both upstream and downstream at the same time.
TCP supports a number of flags, we've already seen SYN, SYN-ACK and ACK. There is also:
- RST - Reset
- PSH - Tells the application to pass all queued data
- FIN - closes the connection
User Datagram Protocol
UDP is a member of the transport layer mentioned at the start. It is quite a different beast to TCP however and serves very different uses. Sometimes a datagram doesn't actual need to be split up and a single one will do just fine. UDP doesn't follow the same pattern of TCP, there's no synchronisation involved or sequence numbers and there doesn't even need to be any acknowledgement from the destination. Broadcasting is also possible with UDP.