正则表达式 - 匹配 HTML 代码中的属性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7671925/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex - Match attribute in a HTML code
提问by Tony
I have problem with matching the html attributes (in a various html tags)with regex. To do so, I use the pattern:
我在将 html 属性(在各种 html 标签中)与正则表达式匹配时遇到问题。为此,我使用以下模式:
myAttr=\"([^']*)\"
HTML snippet:
HTML 片段:
<img alt="" src="1-p2.jpg" myAttr="http://example.com" class="alignleft" />
it selects text from the myAttr
the end />
but I need to select the myAttr="..."
("http://example.com")
它从myAttr
最后选择文本,/>
但我需要选择myAttr="..."
(“ http://example.com”)
回答by Ray Toal
You have an apostrophe ('
) inside your character class but you wanted a quote ("
).
您'
的字符类中有一个撇号 ( ),但您想要一个引号 ( "
)。
myAttr=\"([^"]*)\"
That said, you really shouldn't be parsing HTML with regexes. (Sorry to link to thatanswer again. There are other answers to that question that are more of the "if you know what you are doing..." variety. But it is good to be aware of.)
也就是说,您真的不应该使用 regexes 解析 HTML。(很抱歉再次链接到该答案。该问题还有其他答案,更多的是“如果您知道自己在做什么......”的种类。但最好注意一下。)
Note that even if you limit your regexing to just attributes you have a lot to consider:
请注意,即使您将正则表达式限制为仅属性,您也需要考虑很多:
- Be careful not to match inside of comments.
- Be careful not to match inside of CDATA sections.
- What if attributes are bracketed with single quotes instead of double quotes?
- What if attributes have no quotes at all?
- 注意不要在评论内匹配。
- 注意不要在 CDATA 部分内部匹配。
- 如果属性用单引号而不是双引号括起来怎么办?
- 如果属性根本没有引号怎么办?
This is why pre-built, serious parsers are generally called for.
这就是为什么通常需要预先构建的、严肃的解析器。
回答by John Keyes
The * is a greedy quantifier. You should follow it with a question mark to make it non-greedy:
* 是一个贪婪的量词。你应该在它后面加上一个问号以使其不贪婪:
myAttr=\"([^']*?)\"
回答by Laurent'
If you only want the myAttr parameter value, use this:
如果您只需要 myAttr 参数值,请使用以下命令:
"myAttr=\"([^\"]+)\""
回答by Merianos Nikos
you can try use that
你可以尝试使用它
myAttr=\"?[\w:\-]+ ?= ?("[^"]+"|'[^']+'|\w+)\"
回答by user7671441
<[^>]*>
<[^>]*>
Just try this is this help for remove all tag
试试这是删除所有标签的帮助
Example Something
示例某事