Html 在javascript中编码html实体
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18749591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Encode html entities in javascript
提问by JGallardo
I am working in a CMS which allows users to enter content. The problem is that when they add symbols ?
, it may not display well in all browsers. I would like to set up a list of symbols that must be searched for, and then converted to the corresponding html entity. For example
我在一个允许用户输入内容的 CMS 中工作。问题是当他们添加符号时?
,它可能无法在所有浏览器中很好地显示。我想设置一个必须搜索的符号列表,然后转换为相应的html实体。例如
? => ®
& => &
? => ©
? => ™
? => ®
& => &
? => ©
? =>™
After the conversion, it needs to be wrapped in a <sup>
tag, resulting in this:
转换后,需要包裹在一个<sup>
标签中,结果如下:
?
=> <sup>®</sup>
?
=> <sup>®</sup>
Because a particular font size and padding style is necessary:
因为需要特定的字体大小和填充样式:
sup { font-size: 0.6em; padding-top: 0.2em; }
sup { font-size: 0.6em; padding-top: 0.2em; }
Would the JavaScript be something like this?
JavaScript 会是这样吗?
var regs = document.querySelectorAll('?');
for ( var i = 0, l = imgs.length; i < l; ++i ) {
var [?] = regs[i];
var [?] = document.createElement('sup');
img.parentNode.insertBefore([?]);
div.appendChild([?]);
}
Where "[?]" means that there is something that I am not sure about.
其中“[?]”表示有一些我不确定的事情。
Additional Details:
额外细节:
- I would like to do this with pure JavaScript, not something that requires a library like jQuery, thanks.
- Backend is Ruby
- Using RefineryCMS which is built with Ruby on Rails
- 我想用纯 JavaScript 来做到这一点,而不是需要像 jQuery 这样的库,谢谢。
- 后端是 Ruby
- 使用由 Ruby on Rails 构建的 RefineryCMS
回答by Chris Baker
You can use regex to replace any character in a given unicode range with its html entity equivalent. The code would look something like this:
您可以使用正则表达式将给定 unicode 范围内的任何字符替换为其等效的 html 实体。代码看起来像这样:
var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
return '&#'+i.charCodeAt(0)+';';
});
This code will replace all characters in the given range (unicode 00A0 - 9999, as well as ampersand, greater & less than) with their html entity equivalents, which is simply &#nnn;
where nnn
is the unicode value we get from charCodeAt
.
与HTML表示,这简直就是-这个代码将取代在给定范围内的所有字符(9999,以及符号,更大和小于的unicode 00A0) &#nnn;
,其中nnn
是Unicode值,我们从得到charCodeAt
。
See it in action here: http://jsfiddle.net/E3EqX/13/(this example uses jQuery for element selectors used in the example. The base code itself, above, does not use jQuery)
在此处查看操作:http: //jsfiddle.net/E3EqX/13/(此示例使用 jQuery 作为示例中使用的元素选择器。上面的基本代码本身不使用 jQuery)
Making these conversions does not solve all the problems -- make sure you're using UTF8 character encoding, make sure your database is storing the strings in UTF8. You stillmay see instances where the characters do not display correctly, depending on system font configuration and other issues out of your control.
进行这些转换并不能解决所有问题——确保您使用的是 UTF8 字符编码,确保您的数据库以 UTF8 存储字符串。您仍然可能会看到字符显示不正确的情况,这取决于系统字体配置和您无法控制的其他问题。
Documentation
文档
String.charCodeAt
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt- HTML Character entities - http://www.chucke.com/entities.html
回答by Mathias Bynens
The currently accepted answerhas several issues. This post explains them, and offers a more robust solution. The solution suggested in that answer is:
当前接受的答案有几个问题。这篇文章解释了它们,并提供了一个更强大的解决方案。该答案中建议的解决方案是:
var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
return '&#' + i.charCodeAt(0) + ';';
});
The i
flag is redundant since no Unicode symbol in the range from U+00A0 to U+9999 has an uppercase/lowercase variant that is outside of that same range.
该i
标志是多余的,因为从 U+00A0 到 U+9999 范围内的任何 Unicode 符号都没有在同一范围之外的大写/小写变体。
The m
flag is redundant because ^
or $
are not used in the regular expression.
该m
标志是多余的,因为^
或未$
在正则表达式中使用。
Why the range U+00A0 to U+9999? It seems arbitrary.
为什么范围是 U+00A0 到 U+9999?似乎是随意的。
Anyway, for a solution that correctly encodes allexcept safe & printable ASCII symbols in the input (including astral symbols!), and implements all named character references (not just those in HTML4), use the helibrary(disclaimer: This library is mine). From its README:
无论如何,对于在输入中正确编码除安全和可打印 ASCII 符号之外的所有符号(包括星形符号!)并实现所有命名字符引用(不仅仅是 HTML4 中的那些)的解决方案,请使用he库(免责声明:这个库是我的) )。从它的自述文件:
he(for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersandsand other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — hehandles astral Unicode symbols just fine. An online demo is available.
he(代表“HTML 实体”)是一个用 JavaScript 编写的强大的 HTML 实体编码器/解码器。它支持所有标准化的 HTML 命名字符引用,像浏览器一样处理模糊的&符号和其他边缘情况,具有广泛的测试套件,并且——与许多其他 JavaScript 解决方案相反——他可以很好地处理星体 Unicode 符号。提供在线演示。
Also see this relevant Stack Overflow answer.
另请参阅此相关堆栈溢出答案。
回答by ar34z
I had the same problem and created 2 functions to create entities and translate them back to normal characters. The following methods translate any string to HTML entities and back on String prototype
我遇到了同样的问题并创建了 2 个函数来创建实体并将它们转换回正常字符。以下方法将任何字符串转换为 HTML 实体并返回字符串原型
/**
* Convert a string to HTML entities
*/
String.prototype.toHtmlEntities = function() {
return this.replace(/./gm, function(s) {
// return "&#" + s.charCodeAt(0) + ";";
return (s.match(/[a-z0-9\s]+/i)) ? s : "&#" + s.charCodeAt(0) + ";";
});
};
/**
* Create string from HTML entities
*/
String.fromHtmlEntities = function(string) {
return (string+"").replace(/&#\d+;/gm,function(s) {
return String.fromCharCode(s.match(/\d+/gm)[0]);
})
};
You can then use it as following:
然后,您可以按如下方式使用它:
var str = "Test′??¥¨?˙∫?…???÷∑?????π£¨???en tést".toHtmlEntities();
console.log("Entities:", str);
console.log("String:", String.fromHtmlEntities(str));
Output in console:
控制台输出:
Entities: Dit is e´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æø𣨠ƒ™en t£eést
String: Dit is e′??¥¨?˙∫?…???÷∑?????π£¨???en t£eést
回答by antoineMoPa
Without any library, if you do not need to support IE < 9, you could create a html element and set its content with Node.textContent:
没有任何库,如果您不需要支持 IE < 9,您可以创建一个 html 元素并使用Node.textContent设置其内容:
var str = "<this is not a tag>";
var p = document.createElement("p");
p.textContent = str;
var converted = p.innerHTML;
Here is an example: https://jsfiddle.net/1erdhehv/
这是一个例子:https: //jsfiddle.net/1erdhehv/
回答by takdeniz
You can use this.
你可以用这个。
var escapeChars = {
'¢' : 'cent',
'£' : 'pound',
'¥' : 'yen',
'': 'euro',
'?' :'copy',
'?' : 'reg',
'<' : 'lt',
'>' : 'gt',
'"' : 'quot',
'&' : 'amp',
'\'' : '#39'
};
var regexString = '[';
for(var key in escapeChars) {
regexString += key;
}
regexString += ']';
var regex = new RegExp( regexString, 'g');
function escapeHTML(str) {
return str.replace(regex, function(m) {
return '&' + escapeChars[m] + ';';
});
};
https://github.com/epeli/underscore.string/blob/master/escapeHTML.js
https://github.com/epeli/underscore.string/blob/master/escapeHTML.js
var htmlEntities = {
nbsp: ' ',
cent: '¢',
pound: '£',
yen: '¥',
euro: '',
copy: '?',
reg: '?',
lt: '<',
gt: '>',
quot: '"',
amp: '&',
apos: '\''
};
function unescapeHTML(str) {
return str.replace(/\&([^;]+);/g, function (entity, entityCode) {
var match;
if (entityCode in htmlEntities) {
return htmlEntities[entityCode];
/*eslint no-cond-assign: 0*/
} else if (match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
return String.fromCharCode(parseInt(match[1], 16));
/*eslint no-cond-assign: 0*/
} else if (match = entityCode.match(/^#(\d+)$/)) {
return String.fromCharCode(~~match[1]);
} else {
return entity;
}
});
};
回答by StefansArya
If you want to avoid encode html entities more than once
如果您想避免多次编码 html 实体
function encodeHTML(str){
return str.replace(/[\u00A0-\u9999<>&](?!#)/gim, function(i) {
return '&#' + i.charCodeAt(0) + ';';
});
}
function decodeHTML(str){
return str.replace(/&#([0-9]{1,3});/gi, function(match, num) {
return String.fromCharCode(parseInt(num));
});
}
Example
例子
var text = "<a>Content</a>";
text = encodeHTML(text);
console.log("Encode 1 times: " + text);
// <a>Content</a>
text = encodeHTML(text);
console.log("Encode 2 times: " + text);
// <a>Content</a>
text = decodeHTML(text);
console.log("Decoded: " + text);
// <a>Content</a>
回答by Yash
HTML Special Characters & its ESCAPE CODES
HTML 特殊字符及其 ESCAPE CODES
Reserved Characters must be escaped by HTML: We can use a character escape to represent any Unicode character [Ex: & - U+00026] in HTML, XHTML or XML using only ASCII characters. Numeric character references[Ex:ampersand(&) - &
] & Named character references[Ex: &
] are types of character escape used in markup
.
保留字符必须由 HTML 转义:我们可以使用字符转义来表示 HTML、XHTML 或 XML 中的任何 Unicode 字符 [例如:& - U+00026] 仅使用 ASCII 字符。数字字符引用[例如:与号 (&) - &
] &命名字符引用[例如:&
] 是character escape used in markup
.
Predefined Entities
预定义实体
Original Character XML entity replacement XML numeric replacement
< < <
> > >
" " "
& & &
' ' '
Original Character XML entity replacement XML numeric replacement
< < <
> > >
" " "
& & &
' ' '
To display HTML Tags as a normal form in web page we use <pre>
, <code>
tags or we can escape them. Escaping the string by replacing with any occurrence of the "&"
character by the string "&"
and any occurrences of the ">"
character by the string ">"
. Ex: stackoverflow post
为了在网页中将 HTML 标签显示为普通形式,我们使用<pre>
,<code>
标签或者我们可以转义它们。通过用字符串替换任何出现的"&"
字符"&"
和字符串替换任何出现的字符来转义">"
字符串">"
。前任:stackoverflow post
function escapeCharEntities() {
var map = {
"&": "&",
"<": "<",
">": ">",
"\"": """,
"'": "'"
};
return map;
}
var mapkeys = '', mapvalues = '';
var html = {
encodeRex : function () {
return new RegExp(mapkeys, 'gm');
},
decodeRex : function () {
return new RegExp(mapvalues, 'gm');
},
encodeMap : JSON.parse( JSON.stringify( escapeCharEntities () ) ),
decodeMap : JSON.parse( JSON.stringify( swapJsonKeyValues( escapeCharEntities () ) ) ),
encode : function ( str ) {
return str.replace(html.encodeRex(), function(m) { return html.encodeMap[m]; });
},
decode : function ( str ) {
return str.replace(html.decodeRex(), function(m) { return html.decodeMap[m]; });
}
};
function swapJsonKeyValues ( json ) {
var count = Object.keys( json ).length;
var obj = {};
var keys = '[', val = '(', keysCount = 1;
for(var key in json) {
if ( json.hasOwnProperty( key ) ) {
obj[ json[ key ] ] = key;
keys += key;
if( keysCount < count ) {
val += json[ key ]+'|';
} else {
val += json[ key ];
}
keysCount++;
}
}
keys += ']'; val += ')';
console.log( keys, ' == ', val);
mapkeys = keys;
mapvalues = val;
return obj;
}
console.log('Encode: ', html.encode('<input type="password" name="password" value=""/>') );
console.log('Decode: ', html.decode(html.encode('<input type="password" name="password" value=""/>')) );
O/P:
Encode: <input type="password" name="password" value=""/>
Decode: <input type="password" name="password" value=""/>
回答by Cesar De la Cruz
var htmlEntities = [
{regex:/&/g,entity:'&'},
{regex:/>/g,entity:'>'},
{regex:/</g,entity:'<'},
{regex:/"/g,entity:'"'},
{regex:/á/g,entity:'á'},
{regex:/é/g,entity:'é'},
{regex:/í/g,entity:'í'},
{regex:/ó/g,entity:'ó'},
{regex:/ú/g,entity:'ú'}
];
total = <some string value>
for(v in htmlEntities){
total = total.replace(htmlEntities[v].regex, htmlEntities[v].entity);
}
A array solution
阵列解决方案
回答by Jared Beck
If you're already using jQuery, try html()
.
如果您已经在使用 jQuery,请尝试html()
.
$('<div>').text('<script>alert("gotcha!")</script>').html()
// "<script>alert("gotcha!")</script>"
An in-memory text node is instantiated, and html()
is called on it.
内存中的文本节点被实例化,并html()
在其上被调用。
It's ugly, it wastes a bit of memory, and I have no idea if it's as thorough as something like the he
library but if you're already using jQuery, maybe this is an option for you.
它很丑,浪费了一点内存,我不知道它是否像he
库一样彻底,但如果你已经在使用 jQuery,也许这对你来说是一个选择。
Taken from blog post Encode HTML entities with jQueryby Felix Geisend?rfer.
取自Felix Geisend?rfer 的博文Encode HTML entity with jQuery。
回答by Dave Brown
Sometimes you just want to encode every character... This function replaces "everything but nothing" in regxp.
有时您只想对每个字符进行编码...此函数替换了 regxp 中的“除任何之外的一切”。
function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}
function encode(w) {
return w.replace(/[^]/g, function(w) {
return "&#" + w.charCodeAt(0) + ";";
});
}
test.value=encode(document.body.innerHTML.trim());
<textarea id=test rows=11 cols=55>www.WHAK.com</textarea>