C# 使用没有 url 的 WebBrowser 自动下载文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1145426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 09:17:58  来源:igfitidea点击:

Automated filedownload using WebBrowser without url

c#winformsdownloadbrowserweb-crawler

提问by Sharath

I've been working on a WebCrawler written in C# using System.Windows.Forms.WebBrowser. I am trying to download a file off a website and save it on a local machine. More importantly, I would like this to be fully automated. The file download can be started by clicking a button that calls a javascript function that sparks the download displaying a “Do you want to open or save this file?” dialog. I definitely do not want to be manually clicking “Save as”, and typing in the file name.

我一直在研究使用 System.Windows.Forms.WebBrowser 用 C# 编写的 WebCrawler。我正在尝试从网站下载文件并将其保存在本地计算机上。更重要的是,我希望这是完全自动化的。可以通过单击调用 javascript 函数的按钮来启动文件下载,该函数会触发下载并显示“您想打开还是保存此文件?” 对话。我绝对不想手动单击“另存为”,然后输入文件名。

I am aware of HttpWebRequest and WebClient's download functions, but since the download is started with a javascript, I do now know the URL of the file. Fyi, the javascript is a doPostBack function that changes some values and submits a form.

我知道 HttpWebRequest 和 WebClient 的下载功能,但由于下载是用 javascript 开始的,我现在知道文件的 URL。仅供参考,javascript 是一个 doPostBack 函数,可以更改某些值并提交表单。

I've tried getting focus on the save as dialog from WebBrowser to automate it from in there without much success. I know there's a way to force the download to save instead of asking to save or open by adding a header to the http request, but I don't know how to specify the filepath to download to.

我已经尝试将注意力集中在 WebBrowser 的另存为对话框上,以从那里自动执行它,但没有取得太大成功。我知道有一种方法可以强制下载保存,而不是通过向 http 请求添加标头来要求保存或打开,但我不知道如何指定要下载到的文件路径。

回答by Zyphrax

I think you should prevent the download dialog from even showing. Here might be a way to do that:

我认为您应该阻止下载对话框甚至显示。这可能是一种方法:

  • The Javascript code causes your WebBrowser control to navigate to a specific Url (what would cause the download dialog to appear)

  • To prevent the WebBrowser control from actually Navigating to this Url, attach a event handler to the Navigating event.

  • In your Navigating event you'd have to analyze if this is the actual Navigation action you'd want to stop (is this one the download url, perhaps check for a file extension, there must be a recognizable format). Use the WebBrowserNavigatingEventArgs.Url to do so.

  • If this is the right Url, stop the Navigation by setting the WebBrowserNavigatingEventArgs.Cancel property.

  • Continue the download yourself with the HttpWebRequest or WebClient classes

  • Javascript 代码使您的 WebBrowser 控件导航到特定的 Url(什么会导致下载对话框出现)

  • 要防止 WebBrowser 控件实际导航到此 Url,请将事件处理程序附加到 Navigating 事件。

  • 在您的导航事件中,您必须分析这是否是您想要停止的实际导航操作(这是下载 url,也许检查文件扩展名,必须有可识别的格式)。使用 WebBrowserNavigatingEventArgs.Url 来执行此操作。

  • 如果这是正确的 Url,请通过设置 WebBrowserNavigatingEventArgs.Cancel 属性来停止导航。

  • 使用 HttpWebRequest 或 WebClient 类继续下载

Have a look at this page for more info on the event:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigating.aspx

有关该活动的更多信息,请查看此页面:http:
//msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigating.aspx

回答by Vikram Gehlot

A similar solution is available at http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202/?prof=required

http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202/?prof=required提供了类似的解决方案

This work perfectly if there is direct URL including downloading file-name.

如果有包括下载文件名的直接 URL,则此工作完美。

But sometime some URL generate file dynamically. So URL don't have file name but after requesting that URL some website create file dynamically and then open/save dialog comes.

但有时某些 URL 会动态生成文件。所以 URL 没有文件名,但在请求该 URL 后,一些网站动态创建文件,然后打开/保存对话框。

for example some link generate pdf file on the fly.

例如,某些链接会动态生成 pdf 文件。

How to handle such type of URL?

如何处理这种类型的 URL?

回答by Vikram Gehlot

Take a look at Erika Chinchio article on http://www.codeproject.com/Tips/659004/Download-of-file-with-open-save-dialog-box

看看http://www.codeproject.com/Tips/659004/Download-of-file-with-open-save-dialog-box上的 Erika Chinchio 文章

I have successfully used it for downloading dynamically generated pdf urls.

我已经成功地用它来下载动态生成的 pdf url。

回答by Joshcodes

Assuming the System.Windows.Forms.WebBrowswer was used to access a protected page with a protected link that you want to download:

假设 System.Windows.Forms.WebBrowswer 用于访问受保护的页面,其中包含您要下载的受保护链接:

This code retrieves the actual link you want to download using the web browser. This code will need to be changed for your specific action.The important part is this a field documentLinkUrlthat will be used below.

此代码使用 Web 浏览器检索您要下载的实际链接。需要针对您的特定操作更改此代码。重要的部分是documentLinkUrl下面将使用的字段。

var documentLinkUrl = default(Uri);
browser.DocumentCompleted += (object sender, WebBrowserDocumentCompletedEventArgs e) =>
{
    var aspForm = browser.Document.Forms[0];
    var downloadLink = browser.Document.ActiveElement
        .GetElementsByTagName("a").OfType<HtmlElement>()
        .Where(atag => 
            atag.GetAttribute("href").Contains("DownloadAttachment.aspx"))
        .First();

    var documentLinkString = downloadLink.GetAttribute("href");
   documentLinkUrl = new Uri(documentLinkString);
}
browser.Navigate(yourProtectedPage);

Now that the protected page has been navigated to by the web browser and the download link has been acquired, This code downloads the link.

既然受保护的页面已被 Web 浏览器导航到并获得了下载链接,此代码将下载该链接。

private static async Task DownloadLinkAsync(Uri documentLinkUrl)
{
    var cookieString = GetGlobalCookies(documentLinkUrl.AbsoluteUri);
    var cookieContainer = new CookieContainer();
    using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
    using (var client = new HttpClient(handler) { BaseAddress = documentLinkUrl })
    {
        cookieContainer.SetCookies(this.documentLinkUrl, cookieString);
        var response = await client.GetAsync(documentLinkUrl);
        if (response.IsSuccessStatusCode)
        {
            var responseAsString = await response.Content.ReadAsStreamAsync();
            // Response can be saved from Stream

        }
    }
}

The code above relies on the GetGlobalCookiesmethod from Erika Chinchio which can be found in the excellent article provided by @Pedro Leonardo (available here),

上面的代码依赖于GetGlobalCookiesErika Chinchio的方法,该方法可以在@Pedro Leonardo 提供的优秀文章中找到(可在此处获得),

[System.Runtime.InteropServices.DllImport("wininet.dll", CharSet = System.Runtime.InteropServices.CharSet.Auto, SetLastError = true)]
static extern bool InternetGetCookieEx(string pchURL, string pchCookieName,
    System.Text.StringBuilder pchCookieData, ref uint pcchCookieData, int dwFlags, IntPtr lpReserved);

const int INTERNET_COOKIE_HTTPONLY = 0x00002000;

private string GetGlobalCookies(string uri)
{
    uint uiDataSize = 2048;
    var sbCookieData = new System.Text.StringBuilder((int)uiDataSize);
    if (InternetGetCookieEx(uri, null, sbCookieData, ref uiDataSize,
            INTERNET_COOKIE_HTTPONLY, IntPtr.Zero)
        &&
        sbCookieData.Length > 0)
    {
        return sbCookieData.ToString().Replace(";", ",");
    }
    return null;
}