快速解析 html 的最佳实践是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31080818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best practice to parse html in swift?
提问by amazingbasil
I'm a Swift newbie. I need for something like Python's BeautifulSoup in Swift iOS project. Precisely, I need to get all href
of <a>
that ends with ".txt"
. What are the steps that I should take?
我是 Swift 新手。我需要在 Swift iOS 项目中使用 Python 的 BeautifulSoup 之类的东西。准确地说,我需要把所有href
的<a>
这两端用".txt"
。我应该采取哪些步骤?
回答by Victor Sigler
There are several nice libraries of HTML Parsingusing Swift
and Objective-C
like the followings:
有几个不错的HTML 解析库使用Swift
和Objective-C
如下所示:
Take a look in the following examples in the four libraries posted above, mainly parsed using XPath 2.0:
看看上面发布的四个库中的以下示例,主要使用XPath 2.0解析:
hpple:
惠普:
let data = NSData(contentsOfFile: path)
let doc = TFHpple(htmlData: data)
if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") as? [TFHppleElement] {
for element in elements {
println(element.content)
}
}
NDHpple:
NDHpple:
let data = NSData(contentsOfFile: path)!
let html = NSString(data: data, encoding: NSUTF8StringEncoding)!
let doc = NDHpple(HTMLData: html)
if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") {
for element in elements {
println(element.children?.first?.content)
}
}
Kanna (Xpath and CSS Selectors):
Kanna(Xpath 和 CSS 选择器):
let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"
if let doc = Kanna.HTML(html: html, encoding: NSUTF8StringEncoding) {
var bodyNode = doc.body
if let inputNodes = bodyNode?.xpath("//a/@href[ends-with(.,'.txt')]") {
for node in inputNodes {
println(node.contents)
}
}
}
Fuzi (Xpath and CSS Selectors):
Fuzi(Xpath 和 CSS 选择器):
let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"
do {
// if encoding is omitted, it defaults to NSUTF8StringEncoding
let doc = try HTMLDocument(string: html, encoding: NSUTF8StringEncoding)
// XPath queries
for anchor in doc.xpath("//a/@href[ends-with(.,'.txt')]") {
print(anchor.stringValue)
}
} catch let error {
print(error)
}
The ends-with
function is part of Xpath 2.0.
该ends-with
功能是Xpath 2.0 的一部分。
SwiftSoup (CSS Selectors):
SwiftSoup(CSS 选择器):
do{
let doc: Document = try SwiftSoup.parse("...")
let links: Elements = try doc.select("a[href]") // a with href
let pngs: Elements = try doc.select("img[src$=.png]")
// img with src ending .png
let masthead: Element? = try doc.select("div.masthead").first()
// div with class=masthead
let resultLinks: Elements? = try doc.select("h3.r > a") // direct a after h3
} catch Exception.Error(let type, let message){
print(message)
} catch {
print("error")
}
Ji (XPath):
姬(XPath):
let jiDoc = Ji(htmlURL: URL(string: "http://www.apple.com/support")!)
let titleNode = jiDoc?.xPath("//head/title")?.first
print("title: \(titleNode?.content)") // title: Optional("Official Apple Support")
I hope this helps you.
我希望这可以帮助你。
回答by Scinfu
Try SwiftSoup, a port of jsoup to Swift.
试试SwiftSoup,一个 jsoup 到 Swift 的端口。
let html: String = "<a id=1 href='?foo=bar&mid<=true'>One</a> <a id=2 href='?foo=bar<qux&lg=1'>Two</a>";
let els: Elements = try SwiftSoup.parse(html).select("a");
for element: Element in els.array(){
print(try element.attr("href"))
}
回答by Kio Coan
You could try this swift-html-parser:
你可以试试这个 swift-html-parser:
https://github.com/tid-kijyun/Swift-HTML-Parser
https://github.com/tid-kijyun/Swift-HTML-Parser
It helps a lot.
它有很大帮助。
And for getting your html from a txt you can:
要从 txt 中获取 html,您可以:
let file = "file.txt"
if let dirs : [String] = NSSearchPathForDirectoriesInDomains(NSSearchPathDirectory.DocumentDirectory, NSSearchPathDomainMask.AllDomainsMask, true) as? [String] {
let dir = dirs[0] //documents directory
let path = dir.stringByAppendingPathComponent(file);
let html = String(contentsOfFile: path, encoding: NSUTF8StringEncoding, error: nil)
Edit:
编辑:
To get what you need you could use as the exemple:
要获得您需要的东西,您可以使用以下示例:
import Foundation
let html = "theHtmlYouWannaParse"
var err : NSError?
var parser = HTMLParser(html: html, error: &err)
if err != nil {
println(err)
exit(1)
}
var bodyNode = parser.body
if let inputNodes = bodyNode?.findChildTags("b") {
for node in inputNodes {
println(node.contents)
}
}
if let inputNodes = bodyNode?.findChildTags("a") {
for node in inputNodes {
println(node.getAttributeNamed("href")) //<- Here you would get your files link
}
}