The TIME_WAIT state is the most complicated status among the state transmit of TCP protocol, at first glance its existence is not necessary, some of the optimization techniques eliminate the TIME_WAIT state arbitrarily. But as a part of a protocol that has been applied for so many years, there must be some reason. At least, before eliminating it, we should know the details about it, just as Richard Stevens referred in his book, “Instead of trying to avoid the state, we should understand it”. So what is a TIME_WAIT state TCP endpoint waiting for? Why did the endpoint transition to this state? What are the problems it brought us? Any way that we can keep away from these problems?

How long does the TIME_WAIT state last?

According to TCP specification, once an endpoint is in the TIME_WAIT state, it recommend that the endpoint should stay in this state for 2MSL to make sure the remote endpoint receiving the last FIN segment as much as possible. Once a endpoint is in TIME_WAIT state, the endpoints defining that connection cannot be reused.

2MSL is [Maximum Segment Lifetime] × 2, most of the Linux systems define the MSL 30s. 2MSL is 1 minutes. As we known, The longest life time for a IP packets stay alive in the network is marked by TTL, and TTL is stand for the maximum hops, so there’s no close relationship between the MSL and TTL. In most version of the Linux kernel, MSL is hard coded, and the setting is 1 minutes. But there’s also some other operation system that provided the configuration interface for this value

Transmit to TIME_WAIT state

According to the state transformation of TCP protocol, the endpoint who initiate the FIN will enter the TIME_WAIT state, which means that no matter it’s a client or server, who sent FIN, who TIME_WAIT. But on the other hand, client and server play a totally different role during the communication process. In other words, whether the client or the server transmitting to TIME_WAIT first, will lead to different consequences for the communication. Here are some details. In case 1~3, it’s the client who initiates the FIN. In case 4~5, it’s the server who asks for disconnect first.

A client initiates the FIN request first

If you seeing a lot of connections in TIME_WAIT state then each socket in TIME_WAIT consumes some memory in the kernel, usually somewhat less than an ESTABLISHED socket. But it may increase the load in server.

For solving this you need to do the following on the server,

First run this command to check the status if time_wait

root@server [~]# netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n
 1 CLOSING
 1 established)
 1 Foreign
 14 LAST_ACK
 27 CLOSE_WAIT
 31 LISTEN
 43 FIN_WAIT1
 105 FIN_WAIT2
 114 SYN_RECV
 313 ESTABLISHED
 4568 TIME_WAIT

Then enter the following command, it will append the values for a timeout in the server,

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

Also, you need to edit the file, /etc/sysctl.conf

and add the timeout and recycle values in it.

vi /etc/sysctl.conf

And add the following values to the file

net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_tw_recycle = 1

Now it will be Ok.

Why the TIME_WAIT required?

Why setting the 2MSL? We assume there’s no TIME_WAIT, what will happen?

Scenario 1

Suppose it’s allowed to create two identical (4-tuple) tcp connection at the same time. The 2nd connection is an incarnation of the first one. If there are packets delayed during the first connection, but still alive until the incarnation connection is created. (Because the waiting time is not long enough to make sure the network discard the delayed packets.), this will bring some unknown errors into the network.

Although it’s a event of small probability, there’s still possibilities. The protocal itself has already get some preventive measures to keep this situation from happenning. First, during 3 way handshakes, ISN is one of the measures, second, the client tcp port is assigned by the os kernel most of the time with an ephemeral port which ensures that a new connection to the same host with a different 4 tuple id.

Senario 2

Suppose a tcp disconnection procedure is in processing. The client sends a FIN, receive a ACK. But the next FIN from the server or the last ACK sent to server is lost in the network. What will happen next if the client doesn’t wait for 2MSL? The server resends the FIN, and the client thinks that the communication is over, and answer a RST to the last FIN, the server will get a RST and think “Shit, this is not a successful communication”.