快捷搜索:   服务器  安全  linux 安全  MYSQL  dedecms

tcp连接在断网后的恢复能力

  做项目中遇到一个问题。两台机器上用socket建立一个TCP连接,双向通信,流量很大,这时,通过在路由器上设置100%的丢包率将网络断开,这时 socket当然是发不了包,也收不了,出现大量的重传,然后,取消路由器上的设置,恢复网络,结果,TCP连接client去往server的流量正常了,但server去往client却不通,任凭你如何使劲的send,返回值就是0,而且errno为EAGAIN。

  我用tcpdump看了一下此时的包数据(tc2是server,tc1是client):

  12:08:21.020291 IP tc1.corp.com.42171 > tc2.corp.com.3003: S 4009389430:4009389430(0) win 5840

  12:08:21.020571 IP tc2.corp.com.3003 > tc1.corp.com.42171: R 0:0(0) ack 4009389431 win 0

  12:08:38.934329 IP tc2.corp.com.3903 > tc1.corp.com.3904: P 2398055392:2398056153(761) ack 2538876742 win 724

  12:08:38.934519 IP tc1.corp.com.3904 > tc2.corp.com.3903: . ack 2165 win 13756

  12:08:39.958457 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1:763(762) ack 2165 win 13756

  12:08:39.958485 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 763 win 1448

  12:08:39.958653 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 763:881(118) ack 2165 win 13756

  12:08:39.958660 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 881:997(116) ack 2165 win 13756

  12:08:39.958719 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 997 win 1448

  12:08:39.958890 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 997:1114(117) ack 2165 win 13756

  12:08:39.958898 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1114:1232(118) ack 2165 win 13756

  12:08:39.958903 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1232:1349(117) ack 2165 win 13756

  12:08:39.958971 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 1349 win 1448

  12:08:39.959141 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1349:1466(117) ack 2165 win 13756

  12:08:39.959149 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1466:1583(117) ack 2165 win 13756

  12:08:39.959154 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1583:1700(117) ack 2165 win 13756

  12:08:39.959222 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 1700 win 1448

  tc2不发自己的数据,却只是一味的ACK从tc1传来的数据,等上半个小时,依然如此。它为什么不发呢?

  最后发现是因为我们在socket上设了TCP_NODELAY。去掉这个设置,重启程序,断网恢复以后,TCP双向正常工作。同样用tcpdump看:

  16:05:38.782427 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: P 0:887(887) ack 1 win 26064

  16:05:38.782619 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 3783 win 25352

  16:05:38.782634 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 3783:5231(1448) ack 1 win 26064

  16:05:38.782637 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 5231:6679(1448) ack 1 win 26064

  16:05:38.782890 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 5231 win 25352

  16:05:38.782896 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 6679:8127(1448) ack 1 win 26064

  16:05:38.782898 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 8127:9575(1448) ack 1 win 26064

  16:05:38.782901 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 6679 win 25352

  16:05:38.782904 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 9575:11023(1448) ack 1 win 26064

  16:05:38.783183 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 8127 win 25352

  16:05:38.783188 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 11023:12471(1448) ack 1 win 26064

  16:05:38.783191 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 9575 win 25352

  16:05:38.783193 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 12471:13919(1448) ack 1 win 26064

  16:05:38.783196 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 11023 win 25352

  16:05:38.783199 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 13919:15367(1448) ack 1 win 26064

  16:05:38.783201 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 15367:16815(1448) ack 1 win 26064

  16:05:38.783502 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 12471 win 25352

  16:05:38.783506 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 16815:18263(1448) ack 1 win 26064

  16:05:38.783509 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 13919 win 25352

  16:05:38.783512 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 18263:19711(1448) ack 1 win 26064

  16:05:38.783514 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 15367 win 25352

  16:05:38.783517 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 19711:21159(1448) ack 1 win 26064

  16:05:38.783519 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 16815 win 25352

  tc2这次发自己的数据流了,tc1对其ACK,过了一段时间,tc1也开始发数据,最后双向正常。

  为什么带了TCP_NODEALY的socket,在网络好了以后恢复不了正常?

  看看recv系统调用的实现(2.6.9内核),一直追溯到tcp_recvmsg函数:

  [net/ipv4/tcp.c --> tcp_recvmsg]

  813     while (--iovlen >= 0) {

  814         int seglen = iov->iov_len;

  815         unsigned char __user *from = iov->iov_base;

  816

  817         iov++;

  818

  819         while (seglen > 0) {

  820             int copy;

  821

  822             skb = sk->sk_write_queue.prev;

  823

  824             if (!sk->sk_send_head ||

  825                 (copy = mss_now - skb->len) <= 0) {

  826

  827 new_segment:

  828                 /* Allocate new segment. If the interface is SG,

  829                  * allocate skb fitting to single page.

  830                  */

  831                 if (!sk_stream_memory_free(sk))

  832                     goto wait_for_sndbuf;

  833

  834                 skb = sk_stream_alloc_pskb(sk, select_size(sk, tp),

  835                                0, sk->sk_allocation);

  836                 if (!skb)

  837                     goto wait_for_memory;

  831行判断sndbuf里还有没有空间,如果没有,跳到wait_for_sndbuf

  [net/ipv4/tcp.c --> tcp_recvmsg]

  958 wait_for_sndbuf:

  959             set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);

  960 wait_for_memory:

  961             if (copied)

  962                 tcp_push(sk, tp, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);

  963

  964             if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)

  965                 goto do_error;

  966

  967             mss_now = tcp_current_mss(sk, !(flags&MSG_OOB));

  968         }

  969     }

  970

  971 out:

  972     if (copied)

  973         tcp_push(sk, tp, flags, mss_now, tp->nonagle);

  974     TCP_CHECK_TIMER(sk);

  975     release_sock(sk);

  976     return copied;

  977

  978 do_fault:

  979     if (!skb->len) {

  980         if (sk->sk_send_head == skb)

  981             sk->sk_send_head = NULL;

  982         __skb_unlink(skb, skb->list);

  983         sk_stream_free_skb(sk, skb);

  984     }

  985

  986 do_error:

  987     if (copied)

  988         goto out;

  989 out_err:

  990     err = sk_stream_error(sk, flags, err);

  991     TCP_CHECK_TIMER(sk);

  992     release_sock(sk);

  993     return err;

  sndbuf 不够,于是设个bit位,961行的判断不成立,因为这会儿还啥也没发送,copied为0。继续,执行sk_stream_wait_memory,顾名思义,它是等snbbuf有可用空间,但是我们的socket是设了NONBLOCK的,所以sk_stream_wait_memory很快返回,并设返回值为-EAGAIN。所以,又要跳到do_error,987行的判断依然不成立,于是到了out_err,最后带着-EAGAIN离开 tcp_recvmsg函数。

  这就是我们不停send,却返回结果为0且errno为EAGAIN的原因。

  如果一切正常,socket不停的往外发数据,早晚sndbuf会出现可用空间的。但如果异常呢?比如设了TCP_NODELAY而网络又断了,那就瞬间会发送大量的包,对端却没有ACK。

  我们再看看如果正常,tcp_sendmsg会如何:832行的跳转是不会发生了,于是,程序继续往下(略去一部分skb的操作代码)

  [net/ipv4/tcp.c --> tcp_sendmsg]

  936             if (!copied)

  937                 TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH;

  938

  939             tp->write_seq += copy;

  940             TCP_SKB_CB(skb)->end_seq += copy;

  941             skb_shinfo(skb)->tso_segs = 0;

  942

  943             from += copy;

  944             copied += copy;

  945             if ((seglen -= copy) == 0 && iovlen == 0)

  946                 goto out;

  如果这一把就把消息全放进了skb,且iovec也轮完了,此时945行的判断就生效了,直接跳转out,执行tcp_push。tcp_push调用__tcp_push_pending_frame:

  [net/ipv4/tcp.h --> __tcp_push_pending_frame]

  1508 static __inline__ void __tcp_push_pending_frames(struct sock *sk,

  1509                          struct tcp_opt *tp,

  1510                          unsigned cur_mss,

  1511                          int nonagle)

  1512 {

  1513     struct sk_buff *skb = sk->sk_send_head;

  1514

  1515     if (skb) {

  1516         if (!tcp_skb_is_last(sk, skb))

  1517             nonagle = TCP_NAGLE_PUSH;

  1518         if (!tcp_snd_test(tp, skb, cur_mss, nonagle) ||

  1519             tcp_write_xmit(sk, nonagle))

  1520             tcp_check_probe_timer(sk, tp);

  1521     }

  1522     tcp_cwnd_validate(sk, tp);

  1523 }

  1518行的这个"||"符号很讲究,只有tcp_snd_test返回1了,tcp_write_xmit才会被执行。所以我们先看tcp_snd_test

  [net/ipv4/tcp.h --> tcp_snd_test]

  1452 static __inline__ int tcp_snd_test(struct tcp_opt *tp, struct sk_buff *skb,

  1453                    unsigned cur_mss, int nonagle)

  1454 {

  1455     int pkts = tcp_skb_pcount(skb);

  1456

  1457     if (!pkts) {

  1458         tcp_set_skb_tso_segs(skb, tp->mss_cache_std);

  1459         pkts = tcp_skb_pcount(skb);

  1460     }

  1461

  1462     /*  RFC 1122 - section 4.2.3.4

  1463      *

  1464      *  We must queue if

  1465      *

  1466      *  a) The right edge of this frame exceeds the window

  1467      *  b) There are packets in flight and we have a small segment

  1468      *     [SWS avoidance and Nagle algorithm]

  1469      *     (part of SWS is done on packetization)

顶(0)
踩(0)

您可能还会对下面的文章感兴趣:

最新评论