快速解析 html 的最佳实践是什么？

Question

提问by amazingbasil

I'm a Swift newbie. I need for something like Python's BeautifulSoup in Swift iOS project. Precisely, I need to get all hrefof <a>that ends with ".txt". What are the steps that I should take?

我是 Swift 新手。我需要在 Swift iOS 项目中使用 Python 的 BeautifulSoup 之类的东西。准确地说，我需要把所有href的<a>这两端用".txt"。我应该采取哪些步骤？

Answer 1

回答by Victor Sigler

There are several nice libraries of HTML Parsingusing Swiftand Objective-Clike the followings:

有几个不错的HTML 解析库使用Swift和Objective-C如下所示：

Take a look in the following examples in the four libraries posted above, mainly parsed using XPath 2.0:

看看上面发布的四个库中的以下示例，主要使用XPath 2.0解析：

hpple:

惠普：

let data = NSData(contentsOfFile: path)
let doc = TFHpple(htmlData: data)

if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") as? [TFHppleElement] {
   for element in elements {
       println(element.content)
   }
}

NDHpple:

let data = NSData(contentsOfFile: path)!
let html = NSString(data: data, encoding: NSUTF8StringEncoding)!
let doc = NDHpple(HTMLData: html)
if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") {
   for element in elements {
     println(element.children?.first?.content)
   }
}

Kanna (Xpath and CSS Selectors):

Kanna（Xpath 和 CSS 选择器）：

let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"

if let doc = Kanna.HTML(html: html, encoding: NSUTF8StringEncoding) {
   var bodyNode   = doc.body

   if let inputNodes = bodyNode?.xpath("//a/@href[ends-with(.,'.txt')]") {
      for node in inputNodes {
         println(node.contents)
      }
   }
}

Fuzi (Xpath and CSS Selectors):

Fuzi（Xpath 和 CSS 选择器）：

let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"

do {
  // if encoding is omitted, it defaults to NSUTF8StringEncoding
  let doc = try HTMLDocument(string: html, encoding: NSUTF8StringEncoding)

  // XPath queries
  for anchor in doc.xpath("//a/@href[ends-with(.,'.txt')]") {
    print(anchor.stringValue)
  }

} catch let error {
    print(error)
}

The ends-withfunction is part of Xpath 2.0.

该ends-with功能是Xpath 2.0 的一部分。

SwiftSoup (CSS Selectors):

SwiftSoup（CSS 选择器）：

do{
    let doc: Document = try SwiftSoup.parse("...")
    let links: Elements = try doc.select("a[href]") // a with href
    let pngs: Elements = try doc.select("img[src$=.png]")

    // img with src ending .png
    let masthead: Element? = try doc.select("div.masthead").first()

    // div with class=masthead
    let resultLinks: Elements? = try doc.select("h3.r > a") // direct a after h3
} catch Exception.Error(let type, let message){
    print(message)
} catch {
   print("error")
}

Ji (XPath):

姬（XPath）：

let jiDoc = Ji(htmlURL: URL(string: "http://www.apple.com/support")!)
let titleNode = jiDoc?.xPath("//head/title")?.first
print("title: \(titleNode?.content)") // title: Optional("Official Apple Support")

I hope this helps you.

我希望这可以帮助你。

Answer 2

回答by Scinfu

Try SwiftSoup, a port of jsoup to Swift.

试试SwiftSoup，一个 jsoup 到 Swift 的端口。

let html: String = "<a id=1 href='?foo=bar&mid&lt=true'>One</a> <a id=2 href='?foo=bar&lt;qux&lg=1'>Two</a>";
    let els: Elements = try SwiftSoup.parse(html).select("a");
    for element: Element in els.array(){
        print(try element.attr("href"))
    }

Answer 3

回答by Kio Coan

You could try this swift-html-parser:

你可以试试这个 swift-html-parser：

https://github.com/tid-kijyun/Swift-HTML-Parser

It helps a lot.

它有很大帮助。

And for getting your html from a txt you can:

要从 txt 中获取 html，您可以：

let file = "file.txt"

if let dirs : [String] = NSSearchPathForDirectoriesInDomains(NSSearchPathDirectory.DocumentDirectory, NSSearchPathDomainMask.AllDomainsMask, true) as? [String] {
    let dir = dirs[0] //documents directory
    let path = dir.stringByAppendingPathComponent(file);
    let html = String(contentsOfFile: path, encoding: NSUTF8StringEncoding, error: nil)

Edit:

编辑：

To get what you need you could use as the exemple:

要获得您需要的东西，您可以使用以下示例：

import Foundation

let html = "theHtmlYouWannaParse"

var err : NSError?
var parser     = HTMLParser(html: html, error: &err)
if err != nil {
    println(err)
    exit(1)
}

var bodyNode   = parser.body

if let inputNodes = bodyNode?.findChildTags("b") {
    for node in inputNodes {
        println(node.contents)
    }
}

if let inputNodes = bodyNode?.findChildTags("a") {
    for node in inputNodes {
        println(node.getAttributeNamed("href")) //<- Here you would get your files link
    }
}

快速解析 html 的最佳实践是什么？

提问by amazingbasil

回答by Victor Sigler

回答by Scinfu

回答by Kio Coan

相关推荐

最近更新

标签

快速解析 html 的最佳实践是什么？

提问by amazingbasil

回答by Victor Sigler

回答by Scinfu

回答by Kio Coan

相关推荐

Html 如何让 Bootstrap Carousel 100% 适合屏幕？

Html 即使我禁用了所有相关的 CSS，我网站上链接下的蓝线？

Html Angular js 使用 json 创建动态菜单

Html 使内联块 div 占据剩余宽度的 100%

相关推荐

最近更新

标签