使用 C# 访问网页的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1125739/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 08:48:22  来源:igfitidea点击:

Access the Contents of a Web Page with C#

c#.netdom

提问by Saobi

I am trying to use C# to access the content of a webpage. For example, I want to grab the text of the body of google homepage.

我正在尝试使用 C# 来访问网页的内容。例如,我想抓取谷歌主页正文的文本。

I know this is doable in C# with its web browser control. But I couldn't find a good, simple example of doing it. All the resources I found online involve creating Forms and GUI, which I don't need, I just need a good old Console Application.

我知道这在 C# 中是可行的,它具有 Web 浏览器控件。但是我找不到一个好的,简单的例子来做这件事。我在网上找到的所有资源都涉及创建我不需要的表单和 GUI,我只需要一个很好的旧控制台应用程序。

If anyone can provide a simple console-based code snippet that accomplishes the above, it'll be greatly appreciated.

如果有人可以提供一个简单的基于控制台的代码片段来完成上述操作,我们将不胜感激。

采纳答案by Darin Dimitrov

Actually the WebBrowser is a GUI control used in case you want to visualize a web page (embed and manage Internet Explorer in your windows application). If you just need to get the contents of a web page you could use the WebClientclass:

实际上,WebBrowser 是一个 GUI 控件,用于在您想要可视化网页(在 Windows 应用程序中嵌入和管理 Internet Explorer)时使用。如果您只需要获取网页的内容,您可以使用WebClient类:

class Program
{
    static void Main(string[] args)
    {
        using (var client = new WebClient())
        {
            var contents = client.DownloadString("http://www.google.com");
            Console.WriteLine(contents);
        }
    }
}

回答by Matthew Groves

If you just want the content and not an actual browser, you can use an HttpWebRequest.

如果您只想要内容而不是实际的浏览器,则可以使用 HttpWebRequest。

Here's a code sample: http://www.c-sharpcorner.com/Forums/ShowMessages.aspx?ThreadID=58261

这是一个代码示例: http://www.c-sharpcorner.com/Forums/ShowMessages.aspx?ThreadID=58261

回答by Zr40

The HTML Agility Packmight be what you need. It provides access to HTML pages via DOM and XPath.

HTML敏捷性包可能是你所需要的。它通过 DOM 和 XPath 提供对 HTML 页面的访问。

回答by AndersK

You can do something like this:

你可以这样做:

Uri u = new Uri( @"http://launcher.worldofwarcraft.com/alert" );
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(u);
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
System.IO.Stream st = res.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(st);
string body = sr.ReadToEnd();
System.Console.WriteLine( "{0}", body ); 

the above code shows the maintenance message for WoW USA (if any message has been posted)

上面的代码显示了 WoW USA 的维护消息(如果已发布任何消息)

回答by Joe Kuemerle

You can also use the WatiN library to load and manipulate web pages easily. This was designed as a testing library for web UI's. To use it get the latest from the official site http://watin.sourceforge.net/. For C# the following code in a console application will give you the HTML of the Google home page (this is modified from the getting started example on the WatiN site). The library also contains many more useful methods for getting and setting various parts of the page, taking actions and checking for results.

您还可以使用 WatiN 库轻松加载和操作网页。它被设计为 Web UI 的测试库。要使用它,请从官方网站http://watin.sourceforge.net/获取最新信息。对于 C#,控制台应用程序中的以下代码将为您提供 Google 主页的 HTML(这是从 WatiN 站点上的入门示例修改而来的)。该库还包含更多有用的方法,用于获取和设置页面的各个部分、执行操作和检查结果。

   using System;
    using WatiN.Core;

    namespace Test
    {
      class WatiNConsoleExample
      {
        [STAThread]
        static void Main(string[] args)
        {
          // Open an new Internet Explorer Window and
          // goto the google website.
          IE ie = new IE("http://www.google.com");

          // Write out the HTML text of the body
          Console.WriteLine(ie.Text);


          // Close Internet Explorer and the console window immediately.
          ie.Close();

          Console.Readkey();
        }
      }
    } 

回答by nickytonline

Google screen scraping and as mentioned above use the HttpWebRequest. When you do whatever it is you're doing, I'd recommend using Fiddler to help you figure out what's really going on.

谷歌屏幕抓取和如上所述使用 HttpWebRequest。当你做任何你正在做的事情时,我建议使用 Fiddler 来帮助你弄清楚真正发生了什么。

回答by Devon Holcombe

It's been a decade and Microsoft no longer recommends WebClient for new development as specified in the original accepted answer. The current recommendation is to use Httpclient which is in the System.Net.Http namespace.

十年过去了,Microsoft 不再像最初接受的答案中指定的那样推荐 WebClient 进行新的开发。当前的建议是使用 System.Net.Http 命名空间中的 Httpclient。

The current example from https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=netcore-3.1

来自https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=netcore-3.1的当前示例

is

// HttpClient is intended to be instantiated once per application, rather than per-use. See Remarks.
static readonly HttpClient client = new HttpClient();

static async Task Main()
{
  // Call asynchronous network methods in a try/catch block to handle exceptions.
  try   
  {
     HttpResponseMessage response = await client.GetAsync("http://www.contoso.com/");
     response.EnsureSuccessStatusCode();
     string responseBody = await response.Content.ReadAsStringAsync();
     // Above three lines can be replaced with new helper method below
     // string responseBody = await client.GetStringAsync(uri);

     Console.WriteLine(responseBody);
  }
  catch(HttpRequestException e)
  {
     Console.WriteLine("\nException Caught!");  
     Console.WriteLine("Message :{0} ",e.Message);
  }
}`