什么是下载 HTML 页面的好网络爬虫?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7809730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's a good web crawler to download HTML pages?
提问by Cirem
I am looking for a web crawler/spider to download individual pages. What is a good (preferably free) product that supports this?
我正在寻找一个网络爬虫/蜘蛛来下载单个页面。什么是支持此功能的好(最好是免费的)产品?
回答by Unsigned
回答by satnhak
I'd go for WGET www.gnu.org/s/wget/
我会去 WGET www.gnu.org/s/wget/
回答by Mark
If you want to download a hole website then give wgeta try. It has features to download recursively. If you need to manipulate headers and only download a few small files try curl(or wget). Should you need features like parallel downloading huge files I would suggest aria2.
如果您想下载一个漏洞网站,请尝试使用wget。它具有递归下载的功能。如果您需要操作标头并且只下载几个小文件,请尝试curl(或 wget)。如果您需要并行下载大文件等功能,我建议aria2。
回答by Kiril
A list of open source crawlers: http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers
开源爬虫列表:http: //en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers