使用 VBA HTML 从网页下载文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17224915/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 10:18:48  来源:igfitidea点击:

Download files from a web page using VBA HTML

htmlexcelvba

提问by Nunzio Puntillo

I have been trying desperately for months to automate a process whereby a csv file is downloaded, maned and saved in a given location. so far I only managed with excel vba to open the web page and click the bottom to download the csv file, but the code stop and required a manual intervention to to be completed, i would like it to be fully automated if possible. see the code used (I am not the author):

几个月来,我一直在拼命尝试自动化下载、管理和保存 csv 文件在给定位置的过程。到目前为止,我只使用excel vba 管理打开网页并单击底部下载csv文件,但代码停止并需要手动干预才能完成,如果可能的话,我希望它是完全自动化的。查看使用的代码(我不是作者):

Sub WebDataExtraction()
Dim URL As String
Dim IeApp As Object
Dim IeDoc As Object
Dim ieForm As Object
Dim ieObj As Object
Dim objColl As Collection

URL = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"

Set IeApp = CreateObject("InternetExplorer.Application")
IeApp.Visible = True
IeApp.Navigate URL

Do Until IeApp.ReadyState = READYSTATE_COMPLETE
Loop

Set IeDoc = IeApp.Document
For Each ele In IeApp.Document.getElementsByTagName("span")

If ele.innerHTML = "CSV" Then
Application.Wait (Now + TimeValue("0:00:15"))
DoEvents
ele.Click
'At this point you need to Save the document manually
' or figure out for yourself how to automate this interaction.
Test_Save_As_Set_Filename
File_Download_Click_Save
End If

Next

IeApp.Quit
End Sub"

thanks in advance

提前致谢

Nunzio

农齐奥

采纳答案by David Zemens

I am posting a second answer, since, as I believe my first answer is adequate for many similar applications, it does not work in this instance.

我发布了第二个答案,因为我相信我的第一个答案适用于许多类似的应用程序,因此在这种情况下不起作用。

Why the other methods fail:

为什么其他方法失败:

  • The .Clickmethod: This raises a new window which expects user input at run-time, it doesn't seem to be possible to use the WinAPIto control this window. Or, at least not any way that I can determine. The code execution stops on the .Clickline until the user manually intervenes, there is no way to use a GoToor a Waitor any other method to circumvent this behavior.
  • Using a WinAPIfunction to download the source file directly does not work, since the button's URL does not contain a file, but rather a js function that serves the file dynamically.
  • .Click方法:这就提出了其在运行时希望用户输入一个新的窗口,它似乎并不可能使用WinAPI来控制这个窗口。或者,至少不是我可以确定的任何方式。代码执行停止在线上,.Click直到用户手动干预,没有办法使用aGoTo或aWait或任何其他方法来规避这种行为。
  • 使用WinAPI函数直接下载源文件是行不通的,因为按钮的 URL 不包含文件,而是动态提供文件的 js 函数。

Here is my proposed workaround solution:

这是我建议的解决方法:

You can read the webpage's .body.InnerText, write that out to a plain text/csv file using FileSystemObjectand then with a combination of Regular Expressionsand string manipulation, parse the data into a properly delimited CSV file.

您可以读取网页的.body.InnerTextFileSystemObject然后使用Regular Expressions和 字符串操作将其写入纯文本/csv 文件,然后将数据解析为正确分隔的 CSV 文件。

Sub WebDataExtraction()
    Dim url As String
    Dim fName As String
    Dim lnText As String
    Dim varLine() As Variant
    Dim vLn As Variant
    Dim newText As String
    Dim leftText As String
    Dim breakTime As Date
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
    Dim REMatches As MatchCollection
    Dim m As Match
'## Requires reference to Microsoft Internet Controls
    Dim IeApp As InternetExplorer
'## Requires reference to Microsoft HTML object library
    Dim IeDoc As HTMLDocument
    Dim ele As HTMLFormElement
'## Requires reference to Microsoft Scripting Runtime
    Dim fso As FileSystemObject
    Dim f As TextStream
    Dim ln As Long: ln = 1


    breakTime = DateAdd("s", 60, Now)
    url = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"
    Set IeApp = CreateObject("InternetExplorer.Application")

    With IeApp
        .Visible = True
        .Navigate url

        Do Until .ReadyState = 4
        Loop

        Set IeDoc = .Document
    End With
    'Wait for the data to display on the page
    Do
        If Now >= breakTime Then
            If MsgBox("The website is taking longer than usual, would you like to continue waiting?", vbYesNo) = vbNo Then
                GoTo EarlyExit
            Else:
                breakTime = DateAdd("s", 60, Now)
            End If
        End If
    Loop While Trim(IeDoc.body.innerText) = "XML CSV Please Wait Data Loading Sorting"

    '## Create the text file
    fName = ActiveWorkbook.Path & "\exported-csv.csv"
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set f = fso.OpenTextFile(fName, 2, True, -1)
    f.Write IeDoc.body.innerText
    f.Close
    Set f = Nothing

    '## Read the text file
    Set f = fso.OpenTextFile(fName, 1, False, -1)
    Do
        lnText = f.ReadLine
        '## The data starts on the 4th line in the InnerText.
        If ln >= 4 Then
            '## Return a collection of matching date/timestamps to which we can parse
            Set REMatches = SplitLine(lnText)
            newText = lnText
            For Each m In REMatches
                newText = Replace(newText, m.Value, ("," & m.Value & ","), , -1, vbTextCompare)
            Next
            '## Get rid of consecutive delimiters:
            Do
                newText = Replace(newText, ",,", ",")
            Loop While InStr(1, newText, ",,", vbBinaryCompare) <> 0
            '## Then use some string manipulation to parse out the first 2 columns which are
            '   not a match to the RegExp we used above.
            leftText = Left(newText, InStr(1, newText, ",", vbTextCompare) - 1)
            leftText = Left(leftText, 10) & "," & Right(leftText, Len(leftText) - 10)
            newText = Right(newText, Len(newText) - InStr(1, newText, ",", vbTextCompare))
            newText = leftText & "," & newText

            '## Store these lines in an array
            ReDim Preserve varLine(ln - 4)
            varLine(ln - 4) = newText
        End If
        ln = ln + 1

    Loop While Not f.AtEndOfStream
    f.Close

'## Re-open the file for writing the delimited lines:
    Set f = fso.OpenTextFile(fName, 2, True, -1)
    '## Iterate over the array and write the data in CSV:
    For Each vLn In varLine
        'Omit blank lines, if any.
        If Len(vLn) <> 0 Then f.WriteLine vLn
    Next
    f.Close

EarlyExit:
    Set fso = Nothing
    Set f = Nothing
    IeApp.Quit
    Set IeApp = Nothing

End Sub

Function SplitLine(strLine As String) As MatchCollection
'returns a RegExp MatchCollection of Date/Timestamps found in each line
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
Dim RE As RegExp
Dim matches As MatchCollection
    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = True
        .IgnoreCase = True
        '## Use this RegEx pattern to parse the date & timestamps:
        .Pattern = "(19|20)\d\d[-](0[1-9]|1[012])[-](0[1-9]|[12][0-9]|3[01])[ ]\d\d?:\d\d:\d\d"
    End With
    Set matches = RE.Execute(strLine)
    Set SplitLine = matches
End Function

回答by David Zemens

EDIT

编辑

I tested my original answer code using the URL:

我使用 URL 测试了我的原始答案代码:

http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT#saveasCSV

http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT#saveasCSV

But this method does not seem to work, for this site. The ele.Clickdoesn't seem to initiate the download, it just opens the data tabular on the webpage. To download, you need to do the right-click/save-as. If you have gotten that far (as I suspect, based on the subroutines you are calling, but for which you did not provide the code), then you can probably use the Win API to get the HWND of the Save dialog and possibly automate that event. Santosh provides some information on that:

但是对于这个站点,这种方法似乎不起作用。本ele.Click似乎并没有启动下载,它只是打开网页上的数据表格。要下载,您需要右键单击/另存为。如果您已经做到了这一点(正如我怀疑的那样,基于您正在调用的子例程,但您没有提供代码),那么您可能可以使用 Win API 来获取保存对话框的 HWND 并可能自动执行事件。Santosh 提供了一些相关信息:

VBA - Go to website and download file from save prompt

VBA - 转到网站并从保存提示下载文件

Here is also a good resource that should help solve your problem:

这也是一个很好的资源,应该可以帮助解决您的问题:

http://social.msdn.microsoft.com/Forums/en-US/beb6fa0e-fbc8-49df-9f2e-30f85d941fad/download-file-from-ie-with-vba

http://social.msdn.microsoft.com/Forums/en-US/beb6fa0e-fbc8-49df-9f2e-30f85d941fad/download-file-from-ie-with-vba

Original Answer

原答案

If you are able to determine the URL of the CSV then you can use this subroutine to open a connection to the CSV data and import it directly to the workbook. You may need to automate a text-to-columns operation on the imported data, but that can easily be replicated with the macro recorder. I put an example of this in the Test()subroutine below.

如果您能够确定 CSV 的 URL,则可以使用此子例程打开与 CSV 数据的连接并将其直接导入工作簿。您可能需要对导入的数据自动执行文本到列的操作,但这可以使用宏记录器轻松复制。我在Test()下面的子程序中放了一个例子。

You could easily modify this to add the QueryTablesin a new workbook, and then automate the SaveAsmethod on that workbook to save the file as a CSV.

您可以轻松修改它以将其添加到QueryTables新工作簿中,然后SaveAs在该工作簿上自动执行该方法以将文件另存为 CSV。

This example uses a known URL for Yahoo Finance, Ford Motor Company, and will add a QueryTableswith the CSV data in cell A1of the active worksheet. This can be modified pretty easily to put it in another sheet, another workbook, etc.

此示例使用 Yahoo Finance、Ford Motor Company 的已知 URL,并将在活动工作表的QueryTables单元格A1中添加带有 CSV 数据的。这可以很容易地修改以将它放在另一个工作表、另一个工作簿等中。

Sub Test()
Dim MyURL as String
MyURL = "http://ichart.finance.yahoo.com/table.csv?s=GM&a0&b=1&c2010&d=05&e=20&f=2013&g=d&ignore=.csv"

OpenURL MyURL

'Explode the CSV data:
Range("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
    Semicolon:=False, Comma:=True, Space:=False, Other:=False, FieldInfo _
    :=Array(Array(1, 3), Array(2, 1), Array(3, 1), Array(4, 1), Array(5, 1), Array(6, 1), _
    Array(7, 1)), TrailingMinusNumbers:=True

End Sub

Private Sub OpenURL(fullURL As String)

'This opens the CSV in querytables connection.
On Error GoTo ErrOpenURL
    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;" & fullURL, Destination:=Range("A1"))
        .Name = fullURL
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = True
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingAll
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

ExitOpenURL:
Exit Sub 'if all goes well, you can exit

'Error handling...

ErrOpenURL:
Err.Clear
bCancel = True
Resume ExitOpenURL


End Sub