使用 Objective-C 将 HTML 文本转换为纯文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19226634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting HTML text into plain text using Objective-C
提问by Igor Prusyazhnyuk
I have huge NSString
with HTML text inside. The length of this string is more then 3.500.000 characters. How can i convert this HTML text to NSString
with plain text inside. I was using scanner , but it works too slowly. Any idea ?
我NSString
里面有大量的 HTML 文本。此字符串的长度超过 3.500.000 个字符。我如何将此 HTML 文本转换为NSString
内部纯文本。我正在使用扫描仪,但它的工作速度太慢。任何的想法 ?
采纳答案by Igor Prusyazhnyuk
I resolve my question with scanner, but i use it not for all the text. I use it for every 10.000 text part, before i concatenate all parts together. My code below
我用扫描仪解决了我的问题,但我没有将它用于所有文本。在将所有部分连接在一起之前,我将它用于每 10.000 个文本部分。我的代码如下
-(NSString *)convertHTML:(NSString *)html {
NSScanner *myScanner;
NSString *text = nil;
myScanner = [NSScanner scannerWithString:html];
while ([myScanner isAtEnd] == NO) {
[myScanner scanUpToString:@"<" intoString:NULL] ;
[myScanner scanUpToString:@">" intoString:&text] ;
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];
}
//
html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
return html;
}
Swift 4:
斯威夫特 4:
var htmlToString(html:String) -> String {
var htmlStr =html;
let scanner:Scanner = Scanner(string: htmlStr);
var text:NSString? = nil;
while scanner.isAtEnd == false {
scanner.scanUpTo("<", into: nil);
scanner.scanUpTo(">", into: &text);
htmlStr = htmlStr.replacingOccurrences(of: "\(text ?? "")>", with: "");
}
htmlStr = htmlStr.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines);
return htmlStr;
}
回答by o15a3d4l11s2
It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:
这取决于您的目标 iOS 版本。由于 iOS7 有一个内置方法,它不仅会剥离 HTML 标签,还会将格式设置为字符串:
Xcode 9/Swift 4
Xcode 9/斯威夫特 4
if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
print(attributedString)
}
You can even create an extension like this:
你甚至可以像这样创建一个扩展:
extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = self.data(using: .utf8) else {
return nil
}
do {
return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("Cannot convert html string to attributed string: \(error)")
return nil
}
}
}
Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.
请注意,此示例代码使用 UTF8 编码。您甚至可以创建一个函数而不是计算属性并将编码添加为参数。
Swift 3
斯威夫特 3
let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
Objective-C
目标-C
[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
If you just need to remove everything between <
and >
(dirty way!!!), which might be problematic if you have these characters in the string, use this:
如果您只需要删除<
and之间的所有内容>
(肮脏的方式!!!),如果字符串中有这些字符,这可能会出现问题,请使用以下命令:
- (NSString *)stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
回答by Dharmesh Mansata
Objective C
目标 C
+ (NSString*)textToHtml:(NSString*)htmlString
{
htmlString = [htmlString stringByReplacingOccurrencesOfString:@""" withString:@"\""];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"'" withString:@"'"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"&" withString:@"&"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"<" withString:@"<"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@">" withString:@">"];
return htmlString;
}
Hope this helps!
希望这可以帮助!
回答by Rabindra Nath Nandi
For Swift Language ,
对于 Swift 语言,
NSAttributedString(data:(htmlString as! String).dataUsingEncoding(NSUTF8StringEncoding, allowLossyConversion: true
)!, options:[NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: NSNumber(unsignedLong: NSUTF8StringEncoding)], documentAttributes: nil, error: nil)!
回答by Ahmed Abdallah
- (NSString *)stringByStrippingHTML:(NSString *)inputString
{
NSMutableString *outString;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>| " options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
回答by Josh O'Connor
Swift 4:
斯威夫特 4:
do {
let cleanString = try NSAttributedString(data: htmlContent.data(using: String.Encoding.utf8)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
} catch {
print("Something went wrong")
}
回答by Renetik
It can be more generic by passing encoding type as parameter, but as example as this category:
通过将编码类型作为参数传递,它可以更通用,但作为这个类别的例子:
@implementation NSString (CSExtension)
- (NSString *)htmlToText {
return [NSAttributedString.alloc
initWithData:[self dataUsingEncoding:NSUnicodeStringEncoding]
options:@{NSDocumentTypeDocumentOption: NSHTMLTextDocumentType}
documentAttributes:nil error:nil].string;
}
@end
回答by Hussain Shabbir
Did you try something like that below, Not sure if it will faster as you did before using scanner please check:-
您是否尝试过类似下面的操作,不确定它是否会像使用扫描仪之前那样更快,请检查:-
//String which contains html tags
NSString *htmlString=[NSString stringWithFormat:@"%@",@"<b>right</b> onto <b>Kennington Park Rd/A3</b>Continue to follow A3</div><div >Entering toll zone in 1.7 km at Newington Causeway/A3</div><divGo through 2 roundabouts</div>"];
NSMutableString *mutStr=[NSMutableString string];
NSString *s = nil;
//Removing html elements tags
NSArray *arra=[htmlString componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"</>"]];
NSLog(@"%@",arra);
for (s in arra)
{
[mutStr appendString:@" "];
[mutStr appendString:s];
}
NSLog(@"%@",mutStr);//Printing the output