1、推荐的一种方法:php判断搜索引擎蜘蛛爬虫还是人为访问代码,摘自Discuz x3.2
<?php function checkrobot($useragent=\'\'){ static $kw_spiders = array(\'bot\', \'crawl\', \'spider\' ,\'slurp\', \'sohu-search\', \'lycos\', \'robozilla\'); static $kw_browsers = array(\'msie\', \'netscape\', \'opera\', \'konqueror\', \'mozilla\'); $useragent = strtolower(empty($useragent) ? $_SERVER[\'HTTP_USER_AGENT\'] : $useragent); if(strpos($useragent, \'http://\') === false && dstrpos($useragent, $kw_browsers)) return false; if(dstrpos($useragent, $kw_spiders)) return true; return false; } function dstrpos($string, $arr, $returnvalue = false) { if(empty($string)) return false; foreach((array)$arr as $v) { if(strpos($string, $v) !== false) { $return = $returnvalue ? $v : true; return $return; } } return false; } if(checkrobot()){ echo \'机器人爬虫\'; }else{ echo \'人\'; } ?>
实际应用中可以这样判断,直接不是搜索引擎才执行操作
<?php if(!checkrobot()){ //do something } ?>
2、第二种方法:
使用PHP实现蜘蛛访问日志统计
$useragent = addslashes(strtolower($_SERVER[\'HTTP_USER_AGENT\'])); if (strpos($useragent, \'googlebot\')!== false){$bot = \'Google\';} elseif (strpos($useragent,\'mediapartners-google\') !== false){$bot = \'Google Adsense\';} elseif (strpos($useragent,\'baiduspider\') !== false){$bot = \'Baidu\';} elseif (strpos($useragent,\'sogou spider\') !== false){$bot = \'Sogou\';} elseif (strpos($useragent,\'sogou web\') !== false){$bot = \'Sogou web\';} elseif (strpos($useragent,\'sosospider\') !== false){$bot = \'SOSO\';} elseif (strpos($useragent,\'360spider\') !== false){$bot = \'360Spider\';} elseif (strpos($useragent,\'yahoo\') !== false){$bot = \'Yahoo\';} elseif (strpos($useragent,\'msn\') !== false){$bot = \'MSN\';} elseif (strpos($useragent,\'msnbot\') !== false){$bot = \'msnbot\';} elseif (strpos($useragent,\'sohu\') !== false){$bot = \'Sohu\';} elseif (strpos($useragent,\'yodaoBot\') !== false){$bot = \'Yodao\';} elseif (strpos($useragent,\'twiceler\') !== false){$bot = \'Twiceler\';} elseif (strpos($useragent,\'ia_archiver\') !== false){$bot = \'Alexa_\';} elseif (strpos($useragent,\'iaarchiver\') !== false){$bot = \'Alexa\';} elseif (strpos($useragent,\'slurp\') !== false){$bot = \'雅虎\';} elseif (strpos($useragent,\'bot\') !== false){$bot = \'其它蜘蛛\';} if(isset($bot)){ $fp = @fopen(\'bot.txt\',\'a\'); fwrite($fp,date(\'Y-m-d H:i:s\').\"\\t\".$_SERVER[\"REMOTE_ADDR\"].\"\\t\".$bot.\"\\t\".\'http://\'.$_SERVER[\'SERVER_NAME\'].$_SERVER[\"REQUEST_URI\"].\"\\r\\n\"); fclose($fp); }
第三种方法:
我们可以通过HTTP_USER_AGENT来判断是否是蜘蛛,搜索引擎的蜘蛛都有自己的独特标志,下面列取了一部分。
function is_crawler() { $userAgent = strtolower($_SERVER[\'HTTP_USER_AGENT\']); $spiders = array( \'Googlebot\', // Google 爬虫 \'Baiduspider\', // 百度爬虫 \'Yahoo! Slurp\', // 雅虎爬虫 \'YodaoBot\', // 有道爬虫 \'msnbot\' // Bing爬虫 // 更多爬虫关键字 ); foreach ($spiders as $spider) { $spider = strtolower($spider); if (strpos($userAgent, $spider) !== false) { return true; } } return false; }
下面的php代码附带了更多的蜘蛛标识
function isCrawler() { echo $agent= strtolower($_SERVER[\'HTTP_USER_AGENT\']); if (!empty($agent)) { $spiderSite= array( \"TencentTraveler\", \"Baiduspider+\", \"BaiduGame\", \"Googlebot\", \"msnbot\", \"Sosospider+\", \"Sogou web spider\", \"ia_archiver\", \"Yahoo! Slurp\", \"YoudaoBot\", \"Yahoo Slurp\", \"MSNBot\", \"Java (Often spam bot)\", \"BaiDuSpider\", \"Voila\", \"Yandex bot\", \"BSpider\", \"twiceler\", \"Sogou Spider\", \"Speedy Spider\", \"Google AdSense\", \"Heritrix\", \"Python-urllib\", \"Alexa (IA Archiver)\", \"Ask\", \"Exabot\", \"Custo\", \"OutfoxBot/YodaoBot\", \"yacy\", \"SurveyBot\", \"legs\", \"lwp-trivial\", \"Nutch\", \"StackRambler\", \"The web archive (IA Archiver)\", \"Perl tool\", \"MJ12bot\", \"Netcraft\", \"MSIECrawler\", \"WGet tools\", \"larbin\", \"Fish search\", ); foreach($spiderSite as $val) { $str = strtolower($val); if (strpos($agent, $str) !== false) { return true; } } } else { return false; } } if (isCrawler()){ echo \"你好蜘蛛精!\"; } else{ echo \"你不是蜘蛛精啊!\"; }
第四种方法:
<?php $flag = false; $tmp = $_SERVER[\'HTTP_USER_AGENT\']; if(strpos($tmp, \'Googlebot\') !== false){ $flag = true; } else if(strpos($tmp, \'Baiduspider\') >0){ $flag = true; } else if(strpos($tmp, \'Yahoo! Slurp\') !== false){ $flag = true; } else if(strpos($tmp, \'msnbot\') !== false){ $flag = true; } else if(strpos($tmp, \'Sosospider\') !== false){ $flag = true; } else if(strpos($tmp, \'YodaoBot\') !== false || strpos($tmp, \'OutfoxBot\') !== false){ $flag = true; } else if(strpos($tmp, \'Sogou web spider\') !== false || strpos($tmp, \'Sogou Orion spider\') !== false){ $flag = true; } else if(strpos($tmp, \'fast-webcrawler\') !== false){ $flag = true; } else if(strpos($tmp, \'Gaisbot\') !== false){ $flag = true; } else if(strpos($tmp, \'ia_archiver\') !== false){ $flag = true; } else if(strpos($tmp, \'altavista\') !== false){ $flag = true; } else if(strpos($tmp, \'lycos_spider\') !== false){ $flag = true; } else if(strpos($tmp, \'Inktomi slurp\') !== false){ $flag = true; } if($flag == false){ header(\"Location: http://www.phpstudy.net\" . $_SERVER[\'REQUEST_URI\']); // 自动转到http://www.phpstudy.net 对应的网页 // $_SERVER[\'REQUEST_URI\'] 为域名后面的路径 // 或换成header(\"Location: http://www.phpstudy.net/abc/d.php\"); exit(); } ?>
本文地址:https://www.stayed.cn/item/11548
转载请注明出处。
本站部分内容来源于网络,如侵犯到您的权益,请 联系我