火车头采集网站内页URL(图文)教程!

王建敏
王建敏

发布于 2016-10-24 15:42

3738 浏览
2 评论

导读:通过火车头采集器,采集网站指定栏目下的URL,对采集到的URL进行分析,查看收录未收录情况,针对未收录的URL进行处理,促进收录。

如果栏目页没有规则,需要单条处理。一般来说,栏目页分类都有一定的规则。
学习先读:采集网站资讯栏目的URL,需要栏目页的分页URL有一定的规则,如:

http://www.hdfj11.com/news/1.html
http://www.hdfj11.com/news/2.html
http://www.hdfj11.com/news/3.html
http://www.hdfj11.com/news/4.html

...等差数列
http://www.hdfj11.com/news/1.html
http://www.hdfj11.com/news/2.html
http://www.hdfj11.com/news/4.html
http://www.hdfj11.com/news/8.html

...等比数列
对于企业站优化来说,常更新的栏目即为资讯页面,每天都会更新一些资讯,时间久了,我们也不了解哪些是否被收录,哪些没有被收录,尤其对于大批量更新的网站。下面小编就来讲解一下,如何通过火车头进行URL采集,案例站:华东风机

1、下载火车头采集器

自行搜索下载即可。

2、登录火车头

不需要注册,直接登录即可。

3、点击--新建--新建分组--命名分组“url采集”

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/e0bf4cb5d7e4b401b2385fbf1eabe4f1.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/e0bf4cb5d7e4b401b2385fbf1eabe4f1.png" class="img-polaroid" title="" alt="" /></a>
</div>

4、点击选中--url采集--点击“新建”--新建任务,此时弹出新窗口,命名任务名为:华东风机标题url采集

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/f4a4246279277e627187af586aa52d44.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/f4a4246279277e627187af586aa52d44.png" class="img-polaroid" title="" alt="" /></a>
</div>

然后点击“添加”,如图:

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/f26a4b15642e38d4bd1e862c1eff741b.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/f26a4b15642e38d4bd1e862c1eff741b.png" class="img-polaroid" title="" alt="" /></a>
</div>

5、此时会出现新窗口,“添加开始采集网址”--“批量/多页”选型卡,
因为我们采集的资讯页面,直接进入“华东风机”网站的“资讯”页面,将资讯页面的第一页url填写进去,然后选中“1”,使用通配符替换。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/e405f0e0a30e9f0773da15b812c6a173.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/e405f0e0a30e9f0773da15b812c6a173.png" class="img-polaroid" title="" alt="" /></a>
</div>

6、修改“项数”为10,这里我只采集10页的文章URL。 然后点击“添加”,“添加”完成之后,然后点击“完成”。
注:案例网站按照等差数列来的,所以,按照第一项的规则即可。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/a9817292599253026025fd96e3664de1.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/a9817292599253026025fd96e3664de1.png" class="img-polaroid" title="" alt="" /></a>
</div>

7、点击“完成”回到这个界面,然后点击下面的“测试网址采集”,如下图所示,选中一条点击“测试该页”。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/16569aa6a0506a7ec506f88cde5b1776.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/16569aa6a0506a7ec506f88cde5b1776.png" class="img-polaroid" title="" alt="" /></a>
</div>

8、进入该页面之后,然后选中“出处”点击左侧“删除”,选中“时间”点击左侧“删除”,选中“作者”点击左侧“删除”,选中“标

题”点击左侧“删除”,只留下“内容”,然后勾选上“添加为新记录”

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/6593c4538b9190f7211514395472f68e.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/6593c4538b9190f7211514395472f68e.png" class="img-polaroid" title="" alt="" /></a>
</div>

9、选中“内容”,点击“修改”,弹出下图:

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/549677a5a13766d084d4120b9307b853.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/549677a5a13766d084d4120b9307b853.png" class="img-polaroid" title="" alt="" /></a>
</div>

10、这条是重点,我们选择的采集规则是前后截取,返回到网站,然后进入栏目页,Ctrl+u查看源代码,找到文章标题URL,选中一条标题网址URL前面的代码,复制粘贴到火车头“开始字符串”选框中。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/d6f6eed3a612cc0561725b66367c6bd5.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/d6f6eed3a612cc0561725b66367c6bd5.png" class="img-polaroid" title="" alt="" /></a>
</div>

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/2996a2a5a600615f0a4e5a49779aed32.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/2996a2a5a600615f0a4e5a49779aed32.png" class="img-polaroid" title="" alt="" /></a>
</div>

同上原理截取后面的代码复制到“结束字符串”选框中。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/7b3c506a047e1d86ab44d49239055185.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/7b3c506a047e1d86ab44d49239055185.png" class="img-polaroid" title="" alt="" /></a>
</div>

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/6745910059a10413709b914dc500d24b.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/6745910059a10413709b914dc500d24b.png" class="img-polaroid" title="" alt="" /></a>
</div>

11、完成上面的工作如下图所示:然后点击“确定”

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/a54490bf0c257413c7d0ef3a48770747.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/a54490bf0c257413c7d0ef3a48770747.png" class="img-polaroid" title="" alt="" /></a>
</div>

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/f737b47c1287d115249651a0a413977a.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/f737b47c1287d115249651a0a413977a.png" class="img-polaroid" title="" alt="" /></a>
</div>

12、点击“测试”,如下图所示:

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/69cb9c1b3505e446320f13c7eb8ca822.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/69cb9c1b3505e446320f13c7eb8ca822.png" class="img-polaroid" title="" alt="" /></a>
</div>

13、点击火车头选型卡,然后进入“第三步发布内容设置”,启用第二种方式,保存到本地为表格,点击几下就可以完成,具体步骤不再填写。最后点击“保存”。

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/730f888c2d120aa70ffdb1a7d9cedf98.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/730f888c2d120aa70ffdb1a7d9cedf98.png" class="img-polaroid" title="" alt="" /></a>
</div>

14、返回到这个界面,如下图所示:

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/f737b47c1287d115249651a0a413977a.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/f737b47c1287d115249651a0a413977a.png" class="img-polaroid" title="" alt="" /></a>
</div>

15、最后选中任务,点击上面的“开始”,或者右键“开始任务”,即可!

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/50dffade39f957af74f5f47d7f2a9c4b.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/50dffade39f957af74f5f47d7f2a9c4b.png" class="img-polaroid" title="" alt="" /></a>
</div>

16、采集到的URL,如下图所示:

<div class="aw-comment-upload-img-list active">
<a href="http://ask.seowhy.com/uploads/article/20161024/62a14c5ebe6e44337cc71ba619f57bc6.png" target="_blank" data-fancybox-group="thumb" rel="lightbox"><img src="http://ask.seowhy.com/uploads/article/20161024/62a14c5ebe6e44337cc71ba619f57bc6.png" class="img-polaroid" title="" alt="" /></a>
</div>

小结:采集到了这些URL,然后通过网销客软件或者奏鸣网,批量查询收录情况,然后将标题URL导出到表格,针对未收录的url进行处理

,可以去站长平台提交,也可以拿去发外链。

举报 收藏
管理文章:
张婷 ·

前收入多少?笔者只能笑答曰:勉强够糊口。SEO能不能月入破万

马菊 · 火车头采集规则制作

不错 弄 的