C# 如何使用 HTML Agility Pack 从网站检索所有图像?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2113924/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I use HTML Agility Pack to retrieve all the images from a website?
提问by Sergio Tapia
I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.
我刚刚下载了 HTMLAgilityPack,文档中没有任何示例。
I'm looking for a way to download all the images from a website. The address strings, not the physical image.
我正在寻找一种从网站下载所有图像的方法。地址字符串,而不是物理图像。
<img src="blabalbalbal.jpeg" />
I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer. Everyone said this was the best tool for the job.
我需要提取每个 img 标签的来源。我只是想感受一下图书馆以及它可以提供什么。每个人都说这是完成这项工作的最佳工具。
Edit
编辑
public void GetAllImages()
{
WebClient x = new WebClient();
string source = x.DownloadString(@"http://www.google.com");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load(source);
//I can't use the Descendants method. It doesn't appear.
var ImageURLS = document.desc
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s));
}
采纳答案by SLaks
You can do this using LINQ, like this:
您可以使用 LINQ 执行此操作,如下所示:
var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s));
EDIT: This code now actually works; I had forgotten to write document.DocumentNode
.
编辑:此代码现在实际上有效;我忘记写了document.DocumentNode
。
回答by Anthony
Based on their one example, but with modified XPath:
基于他们的一个示例,但使用修改后的 XPath:
HtmlDocument doc = new HtmlDocument();
List<string> image_links = new List<string>();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//img"))
{
image_links.Add( link.GetAttributeValue("src", "") );
}
I don't know this extension, so I'm not sure how to write out the array to somewhere else, but that will at least get you your data. (Also, I don't define the array correctly, I'm sure. Sorry).
我不知道这个扩展名,所以我不确定如何将数组写出到其他地方,但这至少会让你得到你的数据。(另外,我没有正确定义数组,我确定。抱歉)。
Edit
编辑
Using your example:
使用您的示例:
public void GetAllImages()
{
WebClient x = new WebClient();
string source = x.DownloadString(@"http://www.google.com");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
List<string> image_links = new List<string>();
document.Load(source);
foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img"))
{
image_links.Add( link.GetAttributeValue("src", "") );
}
}