php判断蜘蛛函数

Server 林涛 7788℃ 0评论

PHP写的判定是否为“蜘蛛”(爬虫)数据的函数:

方法一:

function isCrawler() {
    if(ini_get('browscap')) {
        $browser= get_browser(NULL, true);
        if($browser['crawler']) {
            return true;
        }
    } else if (isset($_SERVER['HTTP_USER_AGENT'])){
        $agent= $_SERVER['HTTP_USER_AGENT'];
        $crawlers= array(
            "/Googlebot/",
            "/Yahoo! Slurp;/",
            "/msnbot/",
            "/Mediapartners-Google/",
            "/Scooter/",
            "/Yahoo-MMCrawler/",
            "/FAST-WebCrawler/",
            "/Yahoo-MMCrawler/",
            "/Yahoo! Slurp/",
            "/FAST-WebCrawler/",
            "/FAST Enterprise Crawler/",
            "/grub-client-/",
            "/MSIECrawler/",
            "/NPBot/",
            "/NameProtect/i",
            "/ZyBorg/i",
            "/worio bot heritrix/i",
            "/Ask Jeeves/",
            "/libwww-perl/i",
            "/Gigabot/i",
            "/bot@bot.bot/i",
            "/SeznamBot/i",
        );
        foreach($crawlers as $c) {
            if(preg_match($c, $agent)) {
                return true;
            }
        }
    }
    return false;
}

方法二:

function isCrawler() {
        echo $agent= strtolower($_SERVER['HTTP_USER_AGENT']);
        if (!empty($agent)) {
            $spiderSite= array(
                "TencentTraveler",
                "Baiduspider+",
                "BaiduGame",
                "Googlebot",
                "msnbot",
                "Sosospider+",
                "Sogou web spider",
                "ia_archiver",
                "Yahoo! Slurp",
                "YoudaoBot",
                "Yahoo Slurp",
                "MSNBot",
                "Java (Often spam bot)",
                "BaiDuSpider",
                "Voila",
                "Yandex bot",
                "BSpider",
                "twiceler",
                "Sogou Spider",
                "Speedy Spider",
                "Google AdSense",
                "Heritrix",
                "Python-urllib",
                "Alexa (IA Archiver)",
                "Ask",
                "Exabot",
                "Custo",
                "OutfoxBot/YodaoBot",
                "yacy",
                "SurveyBot",
                "legs",
                "lwp-trivial",
                "Nutch",
                "StackRambler",
                "The web archive (IA Archiver)",
                "Perl tool",
                "MJ12bot",
                "Netcraft",
                "MSIECrawler",
                "WGet tools",
                "larbin",
                "Fish search",
            );
            foreach($spiderSite as $val) {
                $str = strtolower($val);
                if (strpos($agent, $str) !== false) {
                    return true;
                }
            }
        } else {
            return false;
        }
    }
//    if (isCrawler()){
//    echo "它是蜘蛛!";
//    }
//    else{
//        echo "它不是蜘蛛!";
//    }

补充:

比较常见的蜘蛛标识,如果有错误或者没有收集到的,可以留言,我回补充,感谢。

百度蜘蛛:Baiduspider

百度图片:Baiduspider-image

百度WAP:Baiduspider-mobile

百度视频:Baiduspider-video

百度新闻:Baiduspider-news

谷歌蜘蛛:Googlebot

360蜘蛛:360Spider

SOSO蜘蛛:Sosospider

雅虎蜘蛛:Yahoo

有道蜘蛛:YoudaoBot,YodaoBot

搜狗蜘蛛:Sogou News Spider,Sogou web spider、Sogou inst spider、Sogou blog、Sogou Orion spider

必应蜘蛛:bingbot

MSN蜘蛛:msnbot,msnbot-media

一搜蜘蛛:YisouSpider

Alexa蜘蛛:ia_archiver

宜搜蜘蛛:EasouSpider

即刻蜘蛛:JikeSpider

一淘网蜘蛛:EtaoSpider

 

 

 

如需转载请注明: 转载自26点的博客

本文链接地址: php判断蜘蛛函数

转载请注明:26点的博客 » php判断蜘蛛函数

喜欢 (0)
发表我的评论
取消评论

表情