Html 如何使用VB.net获取html页面的源代码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5818116/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the source code of a html page using VB.net?
提问by daniel11
I'm writing a program that gets the source code of a web page with a video on it. It then uses regular expressions to isolate the download link of that video. then it uses httpwebrequest
and httpwebresponse
to download the video. My problem arises when certain sites have a page where you have to click continue in order to get to the video page.
我正在编写一个程序,用于获取带有视频的网页的源代码。然后它使用正则表达式来隔离该视频的下载链接。然后它使用httpwebrequest
和httpwebresponse
下载视频。当某些站点有一个页面时,您必须单击继续才能进入视频页面时,我的问题就出现了。
For example, there is a video playing on http://nextgenvidz.com/view/s995xvc9e2fvcalled "The.Matrix.Reloaded.2003.mp4" so I tell my program to get the source code for the url "http://nextgenvidz.com/view/s995xvc9e2fv" but it can't find the video's download link because it's searching for the file in the "continue" page's source code. If you go to that website above and view source, you won't see the link. Then, click continue and do the same when the video appears and you'll notice that the file is only there in the second one.
例如,在http://nextgenvidz.com/view/s995xvc9e2fv 上播放了一个名为“ The.Matrix.Reloaded.2003.mp4”的视频,所以我告诉我的程序获取 url 的源代码“ http:// nextgenvidz.com/view/s995xvc9e2fv"但它找不到视频的下载链接,因为它正在“继续”页面的源代码中搜索文件。如果您访问上面的那个网站并查看源代码,您将看不到链接。然后,单击继续并在视频出现时执行相同操作,您会注意到该文件仅在第二个中。
How can I get the source code for the page that the video is playing on, and not the page where I have to click continue?
如何获取播放视频的页面的源代码,而不是我必须单击继续的页面的源代码?
I am trying to use this code:
我正在尝试使用此代码:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim Loading As String = "Loading..."
TextBox1.Text = Loading
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox2.Text)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim sourcecode As String = sr.ReadToEnd()
TextBox1.Text = sourcecode
End Sub
Maybe there's a way to auto select the "Continue" button programmatically?
也许有一种方法可以以编程方式自动选择“继续”按钮?
回答by Stephen
This guy answered it very well.
这个人回答的很好。
How can I get HTML page source for websites in VB.NET?
This was his code:
这是他的代码:
Dim sourceString As String = New System.Net.WebClient().DownloadString("SomeWebPage")
回答by Dimitri
I have tried writing something like this in the past and found out that there are bunch of limitations in place (either by browsers or by protocol itself) to prevent automation. Creating an universal website parser will be impossible. You would have to write parsing routines for individual sites, based on the way they hide content from you. You first have to determine pattern of how each of these sites hide the content from user and then implement the actual parsing for each pattern (patterns being either a ling with video destination, or a button that pops up another window with the content video, or a button that executes a javascript that dynamically loads a video into current window)
我过去曾尝试编写类似的内容,但发现存在大量限制(通过浏览器或协议本身)来防止自动化。创建一个通用的网站解析器是不可能的。您必须根据它们向您隐藏内容的方式为各个站点编写解析例程。您首先必须确定这些站点中的每一个如何向用户隐藏内容的模式,然后对每个模式实施实际解析(模式是带有视频目标的 ling,或者是弹出带有内容视频的另一个窗口的按钮,或者执行将视频动态加载到当前窗口的 javascript 的按钮)
回答by Jonas
Dim PictureURL As String = "http://www.bing.com" + New System.Net.WebClient().DownloadString("http://www.bing.com/HPImageArchive.aspx?format=rss&idx=0&n=1&mkt=de-DE").Replace("<link>", "|").Replace("</link>", "|").Split("|")(3)