libcurl:

官网:http://curl.haxx.se/libcurl/

错误码:http://curl.haxx.se/libcurl/c/libcurl-errors.html

This is the generic return code used by functions in the libcurl multi interface. Also consider curl_multi_strerror(3).

CURLM_CALL_MULTI_PERFORM (-1)

This is not really an error. It means you should call curl_multi_perform(3) again without doing select() or similar in between. Before version 7.20.0 this could be returned by curl_multi_perform(3), but in later versions this return code is never used.

CURLM_OK (0)

Things are fine.

CURLM_BAD_HANDLE (1)

The passed-in handle is not a valid CURLM handle.

CURLM_BAD_EASY_HANDLE (2)

An easy handle was not good/valid. It could mean that it isn’t an easy handle at all, or possibly that the handle already is in used by this or another multi handle.

CURLM_OUT_OF_MEMORY (3)

You are doomed.

CURLM_INTERNAL_ERROR (4)

This can only be returned if libcurl bugs. Please report it to us!

CURLM_BAD_SOCKET (5)

The passed-in socket is not a valid one that libcurl already knows about. (Added in 7.15.4)

CURLM_UNKNOWN_OPTION (6)

curl_multi_setopt() with unsupported option (Added in 7.15.4)

 

C API:http://curl.haxx.se/libcurl/c/

C multi API:http://curl.haxx.se/libcurl/c/libcurl-multi.html

C example:http://curl.haxx.se/libcurl/c/example.html

结合php的curl extension、libcurl和tcpdump命令,分析curl_multi_*的处理过程如下:

/*
 * Comment by flykobe:
 * sudo /usr/sbin/tcpdump -i eth0  host item.taobao.com
 * */

function curl_multi_fetch($urlarr=array()){
    $result=$res=$ch=array();
    $nch = 0;
    $mh = curl_multi_init();
    foreach ($urlarr as $nk => $url) {
        $timeout=2;
        $ch[$nch] = curl_init();
        curl_setopt_array($ch[$nch], array(
        CURLOPT_URL => $url,
        CURLOPT_HEADER => false,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT => $timeout,
        ));
        curl_multi_add_handle($mh, $ch[$nch]);
        ++$nch;
    }

        /*
         * Comment by flykobe:
         * 针对mrc做这样的判断后,只要任意一个url出错,则整体会退出。
         *
         * 这里的exec做的是连接建立。
         *
         * exec会针对handler队列中的每一个,都进行处理。
         * */
    /* wait for performing request */
    do {
        $mrc = curl_multi_exec($mh, $running);
                #echo "curl_multi_exec pos1: $running, $mrc\n";
    } while (CURLM_CALL_MULTI_PERFORM == $mrc);

        /*
         * Comment by flykobe:
         * 这时完成3次握手,若无进一步动作,则可能会连接丢失。
         * 推测同对端web server的timeout配置时间相同。
         * 或者是之前设置的timer超时?
         * */
        #sleep(10);

    while ($running && $mrc == CURLM_OK) {
        // wait for network
        if (curl_multi_select($mh, 0.5) > -1) {
            // pull in new data;
            do {
                /*
                 * Comment by flykobe:
                 * 这里的exec做的是http请求的发送、进行从connect到completed的状态转移。
                 * */
                $mrc = curl_multi_exec($mh, $running);
                                #echo "curl_multi_exec pos2: $running, $mrc\n";
            } while (CURLM_CALL_MULTI_PERFORM == $mrc);
        }
    }

    if ($mrc != CURLM_OK) {
        error_log("CURL Data Error");
    }

    /* get data */
    $nch = 0;
    foreach ($urlarr as $moudle=>$node) {
        if (($err = curl_error($ch[$nch])) == '') {
            /*
             * Comment by flykobe:
             * 即使这里不读取,实际上,响应已经发送到本地缓存了。通过tcpdump可以看到。
             * */
            $res[$nch]=curl_multi_getcontent($ch[$nch]);
            $result[$moudle]=$res[$nch];
        }
        else
        {
            error_log("curl error");
                        echo "curl error: $err\n";
        }
        curl_multi_remove_handle($mh,$ch[$nch]);
        curl_close($ch[$nch]);
        ++$nch;
    }
}
$url_arr=array(
         'http://item.taobao.com/item.htm?id=14392877692',
         'http://item.taobao.com/item.htm?id=16231676302',
         'http://item.taobao.com/item.htm?id=17037160462',
     );
function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}
$time_start = microtime_float();
$data=curl_multi_fetch($url_arr);
$time_end = microtime_float();
$time = $time_end - $time_start;
 echo "multi timecost:{$time}\n";

php的curl_multi_exec实际上封装的是libcurl库的curl_multi_preform,在libcurl的源码中,可以看到,该方法针对stack中的每个url handler,都会调用一次到多次multi_runsingle方法,进行以下状态间的转移:

CURLM_STATE_INIT, /* 0 – start in this state */
CURLM_STATE_CONNECT, /* 1 – resolve/connect has been sent off */
CURLM_STATE_WAITRESOLVE, /* 2 – awaiting the resolve to finalize */
CURLM_STATE_WAITCONNECT, /* 3 – awaiting the connect to finalize */
CURLM_STATE_WAITPROXYCONNECT, /* 4 – awaiting proxy CONNECT to finalize */
CURLM_STATE_PROTOCONNECT, /* 5 – completing the protocol-specific connect
phase */
CURLM_STATE_WAITDO, /* 6 – wait for our turn to send the request */
CURLM_STATE_DO, /* 7 – start send off the request (part 1) */
CURLM_STATE_DOING, /* 8 – sending off the request (part 1) */
CURLM_STATE_DO_MORE, /* 9 – send off the request (part 2) */
CURLM_STATE_DO_DONE, /* 10 – done sending off request */
CURLM_STATE_WAITPERFORM, /* 11 – wait for our turn to read the response */
CURLM_STATE_PERFORM, /* 12 – transfer data */
CURLM_STATE_TOOFAST, /* 13 – wait because limit-rate exceeded */
CURLM_STATE_DONE, /* 14 – post data transfer operation */
CURLM_STATE_COMPLETED, /* 15 – operation complete */
CURLM_STATE_MSGSENT, /* 16 – the operation complete message is sent */

这里就包含了连接建立、请求发送、响应获取(到curl的缓存中)、连接关闭等。

Leave a Reply