什么是下载 HTML 页面的好网络爬虫？

Question

提问by Cirem

I am looking for a web crawler/spider to download individual pages. What is a good (preferably free) product that supports this?

我正在寻找一个网络爬虫/蜘蛛来下载单个页面。什么是支持此功能的好（最好是免费的）产品？

Answer 1

回答by Unsigned

wgetor curlcome to mind. What exactly are your requirements? Do you need to recursively crawl pages, or just download specific URLs? wgetcan do both.

wget或curl想到。您的具体要求是什么？您需要递归抓取页面，还是只下载特定的 URL？wget两者都可以。

Answer 2

回答by satnhak

I'd go for WGET www.gnu.org/s/wget/

我会去 WGET www.gnu.org/s/wget/

Answer 3

回答by Mark

If you want to download a hole website then give wgeta try. It has features to download recursively. If you need to manipulate headers and only download a few small files try curl(or wget). Should you need features like parallel downloading huge files I would suggest aria2.

如果您想下载一个漏洞网站，请尝试使用wget。它具有递归下载的功能。如果您需要操作标头并且只下载几个小文件，请尝试curl（或 wget）。如果您需要并行下载大文件等功能，我建议aria2。

Answer 4

回答by Kiril

A list of open source crawlers: http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers

开源爬虫列表：http: //en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers

什么是下载 HTML 页面的好网络爬虫？

提问by Cirem

回答by Unsigned

回答by satnhak

回答by Mark

回答by Kiril

相关推荐

最近更新

标签

什么是下载 HTML 页面的好网络爬虫？

提问by Cirem

回答by Unsigned

回答by satnhak

回答by Mark

回答by Kiril

相关推荐

Html 将 div 放在 float:left div 下方

Html 如何仅为 Safari 添加 css 行

Html 底部几个像素的文字被切断

Html 转义 div 容器的边界

相关推荐

最近更新

标签