C# 如何使用 HTML Agility Pack 从网站检索所有图像？

Question

提问by Sergio Tapia

I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.

我刚刚下载了 HTMLAgilityPack，文档中没有任何示例。

I'm looking for a way to download all the images from a website. The address strings, not the physical image.

我正在寻找一种从网站下载所有图像的方法。地址字符串，而不是物理图像。

<img src="blabalbalbal.jpeg" />

I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer. Everyone said this was the best tool for the job.

我需要提取每个 img 标签的来源。我只是想感受一下图书馆以及它可以提供什么。每个人都说这是完成这项工作的最佳工具。

Edit

编辑

public void GetAllImages()
    {
        WebClient x = new WebClient();
        string source = x.DownloadString(@"http://www.google.com");

        HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
        document.Load(source);

                         //I can't use the Descendants method. It doesn't appear.
        var ImageURLS = document.desc
                   .Select(e => e.GetAttributeValue("src", null))
                   .Where(s => !String.IsNullOrEmpty(s));        
    }

Answer 1

采纳答案by SLaks

You can do this using LINQ, like this:

您可以使用 LINQ 执行此操作，如下所示：

var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
                                .Select(e => e.GetAttributeValue("src", null))
                                .Where(s => !String.IsNullOrEmpty(s));

EDIT: This code now actually works; I had forgotten to write document.DocumentNode.

编辑：此代码现在实际上有效；我忘记写了document.DocumentNode。

Answer 2

回答by Anthony

Based on their one example, but with modified XPath:

基于他们的一个示例，但使用修改后的 XPath：

 HtmlDocument doc = new HtmlDocument();
 List<string> image_links = new List<string>();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//img"))
 {
    image_links.Add( link.GetAttributeValue("src", "") );
 }

I don't know this extension, so I'm not sure how to write out the array to somewhere else, but that will at least get you your data. (Also, I don't define the array correctly, I'm sure. Sorry).

我不知道这个扩展名，所以我不确定如何将数组写出到其他地方，但这至少会让你得到你的数据。（另外，我没有正确定义数组，我确定。抱歉）。

Edit

编辑

Using your example:

使用您的示例：

public void GetAllImages()
    {
        WebClient x = new WebClient();
        string source = x.DownloadString(@"http://www.google.com");

        HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
        List<string> image_links = new List<string>();
        document.Load(source);

        foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img"))
        {
          image_links.Add( link.GetAttributeValue("src", "") );
       }


    }

C# 如何使用 HTML Agility Pack 从网站检索所有图像？

提问by Sergio Tapia

采纳答案by SLaks

回答by Anthony

Edit

编辑

相关推荐

最近更新

标签

C# 如何使用 HTML Agility Pack 从网站检索所有图像？

提问by Sergio Tapia

采纳答案by SLaks

回答by Anthony

Edit

编辑

相关推荐

c#ui自动化

c#字典一键多值

如何通过将 dll 导入引用来在 C#.net for winforms 的工具箱中添加用户控件？

用于调试与发布的 C# if/then 指令

相关推荐

最近更新

标签