Skip to content

Hurry Up and TIME_WAIT

Posted: December 7th, 2004 | Filed under: IIS & HTTP

So, ever wonder what all those TIME_WAITs are doing in your netstat listing?

Okay, for those of you who don’t spend all your waking hours fooling around with Web servers, let me back up a little and explain what that sentence meant.

Netstat is a little utility that many administrators use to monitor the network connections on their servers. It is quite useful for tracking down that small subset of performance bottlenecks that aren’t attributable to yet another piece of convoluted application code that some careless programmer wrote and now you have to take care of. But I digress.

When you run netstat on your busy IIS box, you might get something that looks like this:

C:\>netstat -np tcp

Active Connections

Proto Local Address        Foreign Address      State

TCP   192.168.0.1:80         192.168.0.12:1217     ESTABLISHED
TCP   192.168.0.1:80         192.168.0.5:1218      TIME_WAIT
TCP   192.168.0.1:80         192.168.0.234:1252   TIME_WAIT
TCP   192.168.0.1:80         192.168.0.37:1267     ESTABLISHED
TCP   192.168.0.1:80         192.168.0.23:1298     TIME_WAIT
TCP   192.168.0.1:80         192.168.0.32:1345     TIME_WAIT

And so on and on, for many, many lines. Each line here represents a connection between a TCP socket your server and a matching one on some other machine–usually an HTTP client such as a browser or proxy server, but depending on your architecture you might also see connections to other kinds of servers (database, application, directory, etc.). Each connection has a unique combination of IP addresses and port numbers that identify the endpoints to which the sockets are bound. More to the point, each one also has a state indicator. As connections are set up used and torn down, they pass through a variety of these states, most of which aren’t shown here, because they come and go quite quickly).

The connections in the ESTABLISHED state are, well, established–they are neither being set up nor torn down but just used. This is what you will often see the most of. But what about the others? On a busy HTTP server, the number of sockets in this TIME_WAIT state can far exceed those in the ESTABLISHED state. For instance, I checked an IIS 6.0 box that serves a fairly busy corporate site earlier today and got 124 ESTABLISHED connections versus 431 in TIME_WAIT.

What does this all mean? More importantly, is it something you should be worried about?

The answers are:

1. It’s complicated.

2. Maybe.

To understand what all those TIME_WAITs are doing there, it’s useful to review (or learn) a little TCP. I’ll wait here while you brush up on RFC793.

That was fast. Just kidding. The bit you need to know is so simple, even I can explain it.

As you know, TCP provides a reliable connection between two endpoints, across which data can be sent in segmented form. As part of this, TCP also provides a mechanism for gracefully shutting down such connections. This is accomplished with a full duplex handshake, which can be diagrammed like so:

Server                             Client

————– FIN ————–>

<————- ACK —————

<————- FIN  —————

————– ACK ————->

As you can see by this very sophisticated diagram, a graceful shutdown requires the two endpoints to exchange some TCP/IP packets with the FIN and ACK bits set, in a certain sequence. This exchange of packets in turn corresponds to certain state changes on each side of the connection. In the diagram, I’ve labeled the two sides “Server” and “Client” such that the sequence of events mirrors what usually happens when connections are closed by HTTP.

Here is what happens, step-by-step:

1. First the application at one endpoint–in this example, that would be the Web server–initiates what is called an “active close.” The Web server itself is now done with the connection, but the TCP implementation that supplied the socket it was using still has some work to do. It sends a FIN to the other endpoint and goes into a state called FIN_WAIT_1.

2. Next the TCP endpoint on the browser’s side of the connection acknowledges the server’s FIN by sending back an ACK, and goes into a state called CLOSE_WAIT. When the server side receives this ACK, it switches to a state called FIN_WAIT_2. The connection is now half-closed.

3. At this point, the socket on the client side is in a “passive close,” meaning it waits for the application that was using it (the browser) to close. When this happens, the client sends its own FIN to the server, and deallocates the socket on the client side. It’s done.

4. When the server gets that last FIN, it of course sends back on ACK to acknowledge it, and then goes into the infamous TIME_WAIT state. For how long? Ah, there’s the rub.

The socket that initiated the close is supposed to stay in this state for twice the Maximum Segment Lifetime–2MLS in geek speak. The MLS is supposed to be the length of time a TCP segment can stay alive in the network. So, 2MLS makes sure that any segments still out there when the close starts have time to arrive and be discarded. Why bother with this, you ask?

Because of delayed duplicates, that’s why. Given the nature of TCP/IP, it’s possible that, after an active close has commenced, there are still duplicate packets running around, trying desperately to make their way to their destination sockets. If a new socket binds to the same IP/port combination before these old packets have had time to get flushed out of the network, old and new data could become intermixed. Imagine the havoc this could cause around the office: “You got JavaScript in my JPEG!”

So, TIME_WAIT was invented to keep new connections from being haunted by the ghosts of connections past. That seems like a good thing. So what’s the problem?

The problem is that 2MLS happens to be a rather long time–240 seconds, by default. There are several costs associated with this. The state for each socket is maintained in a data structure called a TCP Control Block (TCB). When IP packets come in they have to be associated with the right TCB and the more TCBs there are, the longer that search takes. Modern implementations of TCP combat this by using a hash table instead of a linear search. Also, since each TIME_WAIT ties up an IP/port combination, too many of them can lead to exhaustion of the default number of ephemeral ports available for handling new requests. And even if the TCB search is relatively fast, and even if there are plenty of ports to bind to, the extra TCBs still take up memory on the server side. In short, the need to limit the costs of TIME_WAIT turns out to be a long-standing problem. In fact, this was part of the original case for persistent connections in HTTP 1.1.

The good news is that you can address this problem by shortening the TIME_WAIT interval. This article by Brett Hill explains how to do so for IIS. As Brett explains, four minutes is probably longer than needed for duplicate packets to flush out of the network, given that modern network latencies tend to be much shorter than that. The bad news is that, while shortening the interval is quite common, it still entails risks. As Faber, Touch and Yue (who are the real experts on this) explain: “The size of the MSL to maintain a given memory usage level is inversely proportional to the connection rate.” In other words, the more you find yourself needing to reduce the length of TIME_WAIT, the more likely doing so will cause problems.

How’s that for a Catch-22?

54 Comments »

54 Comments on “Hurry Up and TIME_WAIT”

  1. 1 john said at 4:54 am on December 21st, 2004:

    I finally decided that I needed to understand what all those TIME_WAITs were for and your explination is the clearest I have found. Thank you for your time and effort.

  2. 2 Nikhil said at 10:27 pm on February 20th, 2005:

    Excellent link.

    It really helped me out a lot!

  3. 3 Aniruddha said at 7:06 am on March 22nd, 2005:

    Very Nice article, but I still have questions…

  4. 4 manoj said at 7:16 am on March 30th, 2005:

    I have webbrowser as client which opens sockets connections in applet to my servlets running in Jserv .I get lots of TIME_WAIT on my machine which eventually break my Jserv process to die .

    does anyone know why ?

    manoj

  5. 5 Steven Reddie said at 7:40 pm on April 27th, 2005:

    manoj — The side shutting down the connection gets the TIME_WAIT. It’s typical for a webserver to shutdown the connection immediately after sending a response. If you can instead stash the connection away and give the client time to close it you can push the TIME_WAIT to them. I believe Apache does something like this. In the worst case, if the client doesn’t shutdown the connection, say within 5 seconds, you’ll have to do it and suffer the TIME_WAIT. However if in that time the client does initiate the close then you have avoided a TIME_WAIT.

  6. 6 Gordon said at 2:03 pm on May 23rd, 2005:

    Is there a way to close a connection on my end that’s listed as "FIN_WAIT_1" for a while? Thanks.

  7. 7 Travis said at 7:30 am on December 10th, 2005:

    I was really worried about it until I read your superb explanation. Now I only think I might be worried about it (the nightmares of green fanged TCP/IP packets are becoming less frequent).

    Thanks

  8. 8 Vijendar Ganta said at 7:18 pm on March 23rd, 2006:

    I always confused by this TIME_WAIT state .. now I feel better understanding Thank you

  9. 9 Lukas Brozovsky said at 2:19 pm on March 30th, 2006:

    And how can one perform stress tests of a web server for example? As I understand, in such case, either the test server or test client would suffer from the limitation of total local port numbers (as somewhere the sockets would stay in TIME_WAIT status)…

  10. 10 Joe @ Port80 said at 4:57 pm on March 30th, 2006:

    Actually, TIME_WAIT can create problems for Web server benchmarking.

    Here (http://www.junjaewoo.com/oracle/oracle9i/doc/relnotes/webcache.htm) is a pretty simple description of the kind of thing that can go wrong:

    [snip]

    In particular, if you run stress tests against Oracle Web Cache and continuously open more TCP connections from one client computer to Oracle Web Cache, you may experience periodic oscillation of throughput. This is usually caused by TCP connection TIME_WAIT in your operating system. In real world deployments, this is not an issue since it is unlikely that a single client will generate a huge number of connections.

    [/snip]

  11. 11 Ollie said at 12:15 pm on May 16th, 2006:

    Great article! Extremely well-written and informative, great for beginners and more advanced readers alike. I’m forwarding this URL around to a few of my friends to read.

  12. 12 vivhost said at 2:19 pm on June 4th, 2006:

    Thanks a lot for a great article. As a webmaster, I found this very useful in dealing with performance optimization issues. Aside from netstat, one tool I also use is echoping, this allows one to test instantly apache response after making configuration changes. Cheers!

  13. 13 junky_jinka said at 12:09 am on July 12th, 2006:

    Hi,

    Nice article!

    In the following 2 pts, we say that the server sends an ACK to acknowledge the client’s FIN. However, in point 3, we say that the client has already deallocated the socket. Why would the server send an ACK to a client with a deallocated socket?

    From other articles, it seems that the client does the final close/deallocation only after receiving the ACK.

    3. At this point, the socket on the client side is in a "passive close," meaning it waits for the application that was using it (the browser) to close. When this happens, the client sends its own FIN to the server, and deallocates the socket on the client side. It’s done.

    4. When the server gets that last FIN, it of course sends back on ACK to acknowledge it, and then goes into the infamous TIME_WAIT state. For how long? Ah, there’s the rub.

    Thanks,

    junky_jinka

  14. 14 Bill Myers said at 10:18 am on July 13th, 2006:

    I’ve looked at thousands of traces and have never seen the server side initiate the FIN sequence.

    The client always closes the connection like this …

    Client Server

    ————– FIN ————–>

    <————- ACK —————

    <————- FIN —————

    ————– ACK ————->

    In HTTP, the server can request the client to close the connection with the layer "Connection: close" header. Or it can close the connection itself at Layer 4, with a TCP Reset, but this is unusual.

  15. 15 Bill Myers said at 10:20 am on July 13th, 2006:

    It’s too bad that the spaces between "Client" and "Server" were removed in my previous post.

  16. 16 Chris @ Port80 said at 12:09 pm on July 13th, 2006:

    Sorry, Bill — this blog software is so-so…

    :)

  17. 17 junky_jinka said at 5:48 am on July 14th, 2006:

    Bill,

    Thanks for the prompt reply.

    We are looking at 2 scenarios:

    1. Browser connects to Web Server and requests an ASP page. Is the browser responsible for sending the FIN which starts off the CLOSE process or is it the web server which finishes executing the ASP, sends the data to the browser and then sends out a FIN to the browser?

    2. ASP Page (on the web server) connects to the database server using usual ado libraries. Is the web server responsible for sending the FIN or is it the database server?

    Thanks,

    junky_jinka

  18. 18 Hassan Issa said at 12:25 am on August 4th, 2006:

    question….I got an application that connects to the server side through winsock(Visual Basic 6) but as soon as messages get transmitted in plenty the server side initiates a close & the application has to use another port to reconnect. This then happens after each subsequent message sent…any remedy for this as the application is time critical.

  19. 19 whatever said at 3:03 pm on October 19th, 2006:

    liked the article. But are all these comments for real?

  20. 20 Klemens said at 6:58 am on December 18th, 2006:

    Two years old and still great :-) thanks.

  21. 21 Steen Alstrup said at 11:44 am on December 20th, 2006:

    Thanks…. searching the web for a TIME_WAIT explaintion and I found it here…

    Merry Christmas, peace on earth….

    /Steen

  22. 22 antonio said at 9:23 am on January 21st, 2007:

    I now think I understand why I have so many time_waits on my netstat.

    Thank you for your explanation, it was very well explained!!

  23. 23 frozenrain@mail.ru said at 8:47 am on March 9th, 2007:

    Thank you for a pretty nice description of the TIME_WAIT problem.

    I suppose that much software developer don’t understand that as you helped to do…..

  24. 24 Chris @ Port80 said at 4:19 pm on March 15th, 2007:

    Yes, Mr. whatever (http://www.port80software.com/200ok/archive/2004/12/07/205.aspx#18549)…

    The comments are real… Who knew TIME_WAIT was worth the wait?

    Cheers,

    Chris @ Port80

  25. 25 tom noyes said at 11:05 am on March 28th, 2007:

    Thank you. We were trying to figure out what we could do about this and this article was the best by far we found.

  26. 26 Eric Lyna said at 7:22 am on May 2nd, 2007:

    Great article – very clear. I’ll share a brief of my experience with port exhaustion: .net web service running on port 81 using NT Authentication on the IIS level… for this situation, every call to the WS opens and closes another socket. With a chatty service (can’t help it!), this can use up the sockets rather quickly. I’ve coded to handle this as I don’t want to rely on changing registry settings on a shared production webserver. In the future, this is a good reason to roll your own security as opposed to using NT Auth! As long as anon access is turned on, and keep_alive is true, it’ll re-use sockets as opposed to spinning them up as NT Auth causes.

  27. 27 Jacob said at 11:05 pm on May 12th, 2007:

    Helpful thanks, what about connections in netstat -n that show myself 127.0.0.1 as local host and my IP has a server address?

  28. 28 VirtualTycoon said at 9:26 pm on June 30th, 2007:

    Thanks man. This is pretty clear. Nice joke about RFC793

  29. 29 Mike said at 8:39 am on August 8th, 2007:

    Hi,

    when I give on MS-DOS prompt the command netstat,

    I get a large printout as follows:



    TCP my_hostname:3546 my_hostname:3545 TIME_WAIT

    TCP my_hostname:3552 my_hostname:3551 TIME_WAIT

    TCP my_hostname:3555 my_hostname:3554 TIME_WAIT

    TCP my_hostname:3561 my_hostname:3560 TIME_WAIT

    TCP my_hostname:3563 my_hostname:3562 TIME_WAIT

    TCP my_hostname:3565 my_hostname:3564 TIME_WAIT

    TCP my_hostname:3570 my_hostname:3569 TIME_WAIT

    TCP my_hostname:3572 my_hostname:3571 TIME_WAIT

    TCP my_hostname:3574 my_hostname:3573 TIME_WAIT

    TCP my_hostname:3582 my_hostname:3581 TIME_WAIT

    TCP my_hostname:3584 my_hostname:3583 TIME_WAIT

    TCP my_hostname:3586 my_hostname:3585 TIME_WAIT

    TCP my_hostname:3588 my_hostname:3587 TIME_WAIT

    Is there someone who can say what this mean?

    My laptop uses Microsoft 2000 operating system.

    Regards,

    Mike

  30. 30 oyun oyunlar said at 2:18 pm on August 31st, 2007:

    It’s too bad that the spaces between "Client" and "Server" were removed in my previous post

  31. 31 minik peri said at 3:13 pm on December 2nd, 2007:

    I now think I understand why I have so many time_waits on my netstat.

    Thank you for your explanation, it was very well explained!!

  32. 32 hugo oyunlari said at 9:38 am on December 21st, 2007:

    I’ve looked at thousands of traces and have never seen the server side initiate the FIN sequence.

    The client always closes the connection like this

  33. 33 Kurt Andersen said at 11:09 pm on March 2nd, 2008:

    Thank you for a great article.

    Especially for the post on how to push the WAIT_TIMEOUT to the client side.

    This helped us very much :-)

  34. 34 film izle said at 10:53 am on March 19th, 2008:

    Thanks…. searching the web for a TIME_WAIT explaintion and I found it here…

  35. 35 Bonson said at 8:21 am on March 25th, 2008:

    Excellent article, thanks for explaining in English

  36. 36 WONG SEO-UL said at 6:16 pm on April 11th, 2008:

    Great info.. I always wonder why Win does not have a tool like netstats..

  37. 37 video watch said at 10:25 pm on April 18th, 2008:

    The problem is that 2MLS happens to be a rather long time–240 seconds, by default. There are several costs associated with this. The state for each socket is maintained in a data structure called a TCP Control Block (TCB)..???

  38. 38 subeler said at 1:09 pm on April 30th, 2008:

    Thanks for the prompt reply.

    We are looking at 2 scenarios:

    1. Browser connects to Web Server and requests an ASP page. Is the browser responsible for sending the FIN which starts off the CLOSE process or is it the web server which finishes executing the ASP, sends the data to the browser and then sends out a FIN to the browser?

    2. ASP Page (on the web server) connects to the database server using usual ado libraries. Is the web server responsible for sending the FIN or is it the database server?

  39. 39 Oyun said at 12:19 am on June 2nd, 2008:

    It’s too bad that the spaces between "Client" and "Server" were removed in my previous post

  40. 40 http://getabu.com said at 7:33 am on June 11th, 2008:

    This is a well organized and outlined blog. Thanks for taking the time.

    cheers, gerardo

  41. 41 Oil Paintings said at 10:56 pm on June 26th, 2008:

    Nice article, the explanation is simple enough to let anyone understand it, but I still have questions, the most important one, how to know the software that created the connection?

    Thanks!

  42. 42 Surajit said at 3:49 pm on June 30th, 2008:

    great article…..i m in a support project working on servers and never could found a great explanation on sockets other than this.

  43. 43 web tasarim said at 5:43 am on July 19th, 2008:

    thank you very nice wonderful posted

  44. 44 youtube said at 8:00 am on July 20th, 2008:

    I now think I understand why I have so many time_waits on my netstat.

  45. 45 gonulcafe said at 11:00 am on September 25th, 2008:

    i m in a support project working on servers and never could found a great explanation on sockets other than this

  46. 46 dikey perde said at 6:30 am on September 26th, 2008:

    Wow this one was a major tough one. Thanks to one of my colleagues for figuring it out. We have a SharePoin

  47. 47 autocad kursu said at 8:08 am on September 26th, 2008:

    thank you very nice web page wonderful

  48. 48 izlekop said at 10:47 am on October 9th, 2008:

    Great article! Extremely well-written and informative, great for beginners and more advanced readers alike. I’m forwarding this URL around to a few of my friends to read.

  49. 49 mirc said at 5:16 am on October 15th, 2008:

    thats nice project.. its name volta. thank you admin

  50. 50 film  said at 1:05 pm on October 17th, 2008:

    Useful knowledge.. With us invention for thanks

  51. 51 YouTube izleSene said at 1:48 pm on October 17th, 2008:

    I finally decided that I needed to understand what all those TIME_WAITs were for and your explination is the clearest I have found

  52. 52 signs of diabetes said at 7:00 pm on November 3rd, 2008:

    Anyway, thanks for collecting all these interesting materials while sharing with us!

  53. 53 Burun Estetigi said at 10:13 am on November 11th, 2008:

    is there any downside to increasing the max users for mysql database? thanks for this article.

  54. 54 NAT Traversal Constraints « DC++: Just These Guys, Ya Know? said at 8:59 am on June 15th, 2010:

    [...] to as much as 4 minutes. One can reduce the TIME_WAIT interval both in Windows and Linux, but it protects against a stray, resent packet from a previous connection from breaking established TCP con…, so it’s unwise to reduce it exessively. Further, it’s a system-wide setting in both [...]