网页真实存在 但百度蜘蛛抓取时出现个别网页地址错误是什么原因?在线急等~~~
这是正常抓取的情况,GET后面的地址是不带域名的:123.125.71.110 - - [23/Aug/2019:06:52:22 +0800] "GET /1149.html HTTP/1.1" 200 64256 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
个别网页可以正常访问,抓取却带了域名导致404:
220.181.108.106 - - [23/Aug/2019:05:14:02 +0800] "GET /www.whlihun.com/3724.html HTTP/1.1" 404 479 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
还有一个页面,出现了两只蜘蛛抓取完全不同的结果:
220.181.108.157 - - [23/Aug/2019:08:17:16 +0800] "GET /3075.html HTTP/1.1" 200 64687 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
220.181.108.159 - - [23/Aug/2019:03:48:59 +0800] "GET /www.whlihun.com/3075.html HTTP/1.1" 404 479 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
请问这到底是蜘蛛的问题,还是我自己的问题?