什么是下载 HTML 页面的好网络爬虫?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7809730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 11:09:53  来源:igfitidea点击:

What's a good web crawler to download HTML pages?

htmlweb-crawler

提问by Cirem

I am looking for a web crawler/spider to download individual pages. What is a good (preferably free) product that supports this?

我正在寻找一个网络爬虫/蜘蛛来下载单个页面。什么是支持此功能的好(最好是免费的)产品?

回答by Unsigned

wgetor curlcome to mind. What exactly are your requirements? Do you need to recursively crawl pages, or just download specific URLs? wgetcan do both.

wgetcurl想到。您的具体要求是什么?您需要递归抓取页面,还是只下载特定的 URL?wget两者都可以。

回答by satnhak

I'd go for WGET www.gnu.org/s/wget/

我会去 WGET www.gnu.org/s/wget/

回答by Mark

If you want to download a hole website then give wgeta try. It has features to download recursively. If you need to manipulate headers and only download a few small files try curl(or wget). Should you need features like parallel downloading huge files I would suggest aria2.

如果您想下载一个漏洞网站,请尝试使用wget。它具有递归下载的功能。如果您需要操作标头并且只下载几个小文件,请尝试curl(或 wget)。如果您需要并行下载大文件等功能,我建议aria2