How TCP States Show Where a Connection Is Stuck

Reading time: 9 minute Word count: 1706

Networking TCP Connection States TIME_WAIT CLOSE_WAIT

When debugging TCP, ss or netstat often shows a list of states first. The hard part is mapping those states to the phase where the connection is stuck.

Many SYN-SENT entries: is the server slow or the network unreachable? The connection is still ESTABLISHED: why did the business request time out? CLOSE-WAIT is piling up: is this a network issue or an application not closing sockets? TIME-WAIT is large: should kernel parameters be changed immediately? A long connection appears alive locally, but NAT or firewall state may already be gone.

The value of TCP states is not memorizing the entire state diagram. It is locating which phase the connection is stuck in.

The safest first model is this: TCP state describes the connection phase as seen by the local endpoint. During setup, check whether SYN exchange completed. During data transfer, check whether application progress continues. During close, check who closed first and who has not finished. For long connections, also check whether middlebox state still exists.

connection setup: SYN-SENT / SYN-RECEIVED
data transfer:    ESTABLISHED
active close:     FIN-WAIT-1 / FIN-WAIT-2 / TIME-WAIT
passive close:    CLOSE-WAIT / LAST-ACK
abnormal cleanup: RST / timeout / keepalive / NAT state expired

When you see a TCP state, ask whether the connection is stuck in setup, data progress, shutdown, application cleanup, or a middlebox state boundary.

Many SYN-SENT: Request Sent, No Reply Came Back

After a client sends SYN, it enters SYN-SENT. It is waiting for the server’s SYN+ACK.

If there are many SYN-SENT connections, the connection is not established yet. Check:

target IP reachability
routing
DNS result
whether the server port is listening
firewall or security group dropping SYN
NAT outbound mapping
whether the server’s reply path works
middlebox filtering or rate limiting

Packet capture is straightforward:

SYN appears, but no SYN+ACK

Do not start with HTTP, MQTT, or TLS. The connection has not been established, so the application protocol has not had a normal chance to begin.

Many SYN-RECEIVED: Server Saw SYN, Handshake Did Not Close

After the server receives SYN and sends SYN+ACK, it enters SYN-RECEIVED, waiting for the client’s final ACK.

If a server has many SYN-RECEIVED connections, it has at least seen connection requests, but the three-way handshake is not completing. Possible causes include:

server’s SYN+ACK cannot reach the client
client’s final ACK cannot return
firewall allows only one direction
SYN backlog pressure
SYN flood or scanning
wrong return route
asymmetric NAT mapping or security policy

This differs from SYN-SENT. Many SYN-SENT entries often start from the client path. Many SYN-RECEIVED entries mean the server already saw SYN, so focus on return path, half-open queue, and final ACK.

ESTABLISHED Does Not Mean the Business Is Healthy

ESTABLISHED only means the TCP connection is established and both sides can transfer data.

It does not guarantee:

TLS handshake completed
HTTP request was sent or answered
MQTT broker still considers the session valid
application thread is reading from the socket
NAT or firewall state still exists
the peer process is not stuck

Many long-connection problems sit here. ss shows ESTABLISHED, but the application is no longer working. Causes include:

peer lost power without FIN/RST
NAT or firewall state expired
application heartbeat stopped
TLS is still waiting for data
socket buffers are backed up
TCP keepalive is off or too slow

So ESTABLISHED is a starting point, not a conclusion. Check whether data and ACKs are moving, whether application heartbeats are healthy, and whether captures show retransmissions or RTOs.

Many CLOSE-WAIT: Peer Closed, Local Application Did Not

CLOSE-WAIT is one of the most actionable TCP states.

It means the peer has sent FIN, the local TCP stack has acknowledged it, but the local application has not closed the socket.

In other words, the TCP stack has delivered close information to the local application. Now it waits for the application to call close() or release the connection.

If CLOSE-WAIT keeps accumulating, this is usually not a network problem. It is a local resource cleanup problem:

application did not handle peer close
error path missed close
connection object reference was not released
thread is stuck before cleanup
connection pool failed to recycle bad connections

Captures usually show that peer FIN has arrived. Local state remains CLOSE-WAIT. At that point, tuning firewall, route, or MTU is unlikely to help. Inspect application close paths.

FIN-WAIT-1 and FIN-WAIT-2: Local Side Closed Actively

The side that actively closes sends FIN first.

After sending FIN, it enters FIN-WAIT-1, waiting for the peer to ACK. After receiving that ACK, it enters FIN-WAIT-2, waiting for the peer’s own FIN.

local close
-> FIN-WAIT-1: wait for peer ACK of local FIN
-> FIN-WAIT-2: local send direction closed, wait for peer FIN

Many FIN-WAIT-1 entries may mean the peer did not acknowledge the local FIN, or FIN/ACK was lost.

Many FIN-WAIT-2 entries mean the local FIN was acknowledged, but the peer has not closed its send direction. The peer application may not have closed, may be stuck, or the protocol may allow half-close for a while.

Interpret this with business semantics. Some protocols allow half-close, but long-lived FIN-WAIT-2 buildup in ordinary applications is worth checking on the peer close path.

Many TIME-WAIT Is Not Always a Leak

TIME-WAIT is often misread.

The active closer usually enters TIME-WAIT after the four-way close completes, and stays there for a while. It serves two main purposes:

if the final ACK is lost, it can be retransmitted
old delayed packets from the previous connection can expire before a new connection with the same tuple is reused

So many TIME-WAIT entries are not automatically a bug. They commonly appear when:

this side is a client making many short connections
this side actively closes connections
HTTP short connections or no connection reuse are used
health checks are frequent

The real question is whether they cause resource pressure: ephemeral port exhaustion, connection table pressure, file descriptor pressure, or just a large but harmless count.

Do not tune kernel parameters immediately just because TIME-WAIT is visible. First inspect connection reuse, long-connection strategy, request rate, and which side is the active closer.

LAST-ACK: Local Side Sent FIN, Waiting for Final ACK

The passive closer enters CLOSE-WAIT after receiving peer FIN. When the application closes, the local side sends its own FIN and enters LAST-ACK, waiting for the peer’s acknowledgement.

If many connections remain in LAST-ACK, the local side has already tried to finish closing, but the peer has not acknowledged the local FIN, or the ACK was lost.

Possible causes:

peer disappeared
path loss
NAT or firewall state was removed too early
peer TCP stack or application is abnormal

LAST-ACK and CLOSE-WAIT mean different things. CLOSE-WAIT means the local application has not closed yet. LAST-ACK means it has closed and is waiting for the peer’s final ACK.

RST Is Abnormal Termination, Not Normal Close

Normal close uses FIN/ACK. RST means the connection was reset, a stronger termination path.

Common causes include:

no process is listening on the target port
application uses an abortive close policy
packet does not fit the current connection state
firewall or middlebox injects RST
process exits or crashes and resets sockets
half-open connection is cleaned up by the system

When you see RST, check who sent it, when it was sent, and what state the connection was in beforehand.

A server that resets after receiving a request is different from a middlebox that resets before handshake. An application that sends RST after partial data is also different from normal close, and may cause unread data to be discarded by the peer.

Long-Connection Half-Alive State: TCP Still Exists, Middlebox State Is Gone

IoT and mobile networks often show this symptom: locally, TCP still looks ESTABLISHED, but the cloud receives no data, or the next send takes a long time to fail.

The middlebox state may already be gone:

NAT mapping expired
firewall connection table reclaimed state
cellular path switched
Wi-Fi roaming or power save changed the path
peer lost power without FIN/RST

If endpoints send no data, TCP may not know immediately that the middle path disappeared. The next send may go through retransmissions and RTO before the connection is declared unusable.

That is why long connections often need:

TCP keepalive
application heartbeat
MQTT keepalive
idle timeout policy
reconnect strategy after failure

TCP keepalive and application heartbeat are not the same layer. TCP keepalive only checks whether the transport connection may still be alive. Application heartbeat also checks whether the peer application protocol state is progressing.

Use ss and Packet Capture Together

Local TCP state alone is not enough. Combine it with packet capture.

A practical order:

ss -tanp for state, peer, and process
SYN-SENT: capture whether SYN appears without SYN+ACK
SYN-RECEIVED: on the server, check whether SYN+ACK was sent and final ACK returned
ESTABLISHED: check data and ACK progress, retransmissions, and RTOs
CLOSE-WAIT: check whether peer FIN arrived and local process missed close
FIN-WAIT-2: check whether local FIN was ACKed and peer does not send FIN
TIME-WAIT: confirm active closer and whether it causes real port or resource pressure
half-alive long connection: check NAT/firewall timeout, keepalive, application heartbeat, and retransmissions after next send

TCP state tells you what the local TCP stack sees. Packet capture tells you what happened on the wire. Application logs tell you whether business state is progressing. Use all three.

What to Remember

TCP states are evidence of connection phase, not a vocabulary test.

SYN-SENT and SYN-RECEIVED point to handshake. ESTABLISHED means the transport channel exists, not that the business is healthy. CLOSE-WAIT usually points to local application cleanup. FIN-WAIT points to active close. TIME-WAIT is usually a normal protection state. Half-alive long connections require separating TCP endpoint state, middlebox state, and application heartbeat.

Do not ask only whether the connection is established. Ask whether it is stuck in setup, data transfer, close, resource cleanup, or NAT/firewall state boundaries.