为什么是“​” 被注入到我的 HTML 中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18478847/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 12:42:52  来源:igfitidea点击:

Why is "​" being injected into my HTML?

htmlencodingsublimetext2

提问by gwg

EDIT: You can see the issue here(look in source).

编辑:您可以在此处查看问题(查看源代码)。

EDIT2: Interesting, it is notan issue in source. Only with the console (Firebug as well).

EDIT2:有趣的是,这不是源代码的问题。仅与控制台(以及 Firebug)。

I have the following markup in a file called test.html:

我在名为 的文件中有以下标记test.html

?<!DOCTYPE html>
<html>
<head>
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
    <h3>Test Harness</h3>
</body>
</html>

But in Chrome, I see:

但在 Chrome 中,我看到:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    "&#8203;


        "
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
    <h3>Test Harness</h3>
</body>
</html>

It looks like &#802is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?

看起来̢是一个零宽度空间,但是是什么导致了它?我正在使用带有 UTF-8 编码的 Sublime Text 2 和带有 Jinja2 的 Google App Engine(但 Jinja 只是加载test.html)。有什么想法吗?

Thanks in advance.

提前致谢。

采纳答案by Jukka K. Korpela

It is an issue in the source. The live examplethat you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain's HTTP Viewerby selecting “Hex” under “Display Format”. Also note that validatingthe page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”

这是源头的问题。您提供的实时示例以以下字节开头(即,它们出现在 之前<!DOCTYPE html>):0xE2 0x80 0x8B。例如,通过在“显示格式”下选择“十六进制”,使用 Rex Swain 的HTTP 查看器可以看到这一点。另请注意,使用 W3C 标记验证器验证页面提供的信息表明文档开头存在一些非常错误的信息,尤其是消息“第 1 行,第 1 列:未先查看文档类型就找到了非空格字符”。

What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the bodyelement (since character data cannot validly appear in the headelement or before it), implying an empty headelement before it.

在验证器和 Chrome 工具中发生的事情 - 以及例如在 Firebug 中 - 字节 0xE2 0x80 0x8B 被视为字符数据,它隐式启动body元素(因为字符数据不能有效地出现在head元素中或之前) ,暗示head它前面有一个空元素。

The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.

当然,解决方案是删除这些字节。浏览器通常会忽略它们,但您不应该依赖此类错误处理,并且字节会阻止有用的 HTML 验证。您如何删除它们,以及它们最初是如何到达那里的,取决于您的创作环境。

Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE(U+200B) character. It has no visible glyph and no width, so you won't notice anything in the visual presentation even though browsers treat it as being data at the start of the bodyelement. The notation &#8203;is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.

由于页面被声明为(在 HTTP 标头中)为 UTF-8 编码,这些字节代表零宽度空间(U+200B) 字符。它没有可见的字形和宽度,因此即使浏览器将其视为body元素开头的数据,您也不会注意到视觉呈现中的任何内容。该符号&#8203;是它的字符引用,大概是浏览器工具用来指示通常不可见字符的存在。

It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE(U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode namesof the characters.

生成 HTML 文档的软件可能旨在插入零宽度无间断空间(U+FEFF)。这本来是有效的,因为根据特殊约定,UTF-8 编码的数据可能以此字符开头,也称为字节顺序标记 ( BOM),出现在数据的开头。使用 U+200B 而不是 U+FEFF 听起来像是软件不太可能犯的错误,但是如果人们想到字符的 Unicode名称,他们可能会误以为是这样。

回答by grmdgs

I understand that there is a bug in SharePoint 2013 where the HTML editor adds these characters into your content.

我了解到 SharePoint 2013 中存在一个错误,HTML 编辑器会将这些字符添加到您的内容中。

I've been dealing with this for a bit and this is the solution I am using which seems to be working. I added this javascript into a file referenced by my masterpage.

我一直在处理这个问题,这是我正在使用的似乎有效的解决方案。我将此 javascript 添加到我的母版页引用的文件中。

var elements = ["h1","h2","h3","h4","p","strong","label","span","a"];
function targetZWS(){
    for (var i = 0; i < elements.length; i++) {
      jQuery(elements[i]).each(function() {
        removeZWS(this);
      });
    }
}
function removeZWS(target) {
  jQuery(target).html(jQuery(target).html().replace(/\u200B/g,''));
}

/*load functions*/
$(document).ready(function() {
    _spBodyOnLoadFunctionNames.push("targetZWS");

});

Links I looked into investigating this:

我调查过的链接:

  1. https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a-richtext-field?forum=sharepointdevelopment

  2. https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral

  3. http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint

  1. https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a- Richtext-field?forum=sharepointdevelopment

  2. https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral

  3. http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint

回答by Tarek Salah uddin Mahmud

Try this script. It works for me

试试这个脚本。这个对我有用

$( document ).ready(function() {
    var abc = document.body.innerHTML;
    var a = String(abc).replace(/\u200B/g,'');
    document.body.innerHTML = a;
});

回答by Sandy Abrah

I have experienced this in a major project I was working on.

我在我从事的一个主要项目中经历过这种情况。

The trick is to just:

诀窍是:

  • copy the whole code into notepad.

  • save it as a text file.

  • close the file. open it again and copy your code back into your IDE
    environment.

  • 将整个代码复制到记事本中。

  • 将其保存为文本文件。

  • 关闭文件。再次打开它并将您的代码复制回您的 IDE
    环境。

and its voilà, it's gone.!

瞧,它不见了。!

回答by Oleg Averkov

In my case, symbol "&#8203;"did not appear in the code editor MS Code and was visible only in the tab Elements Chrome. It helped to delete the tag after which this symbol appeared and the reprint of this tag was handwritten again, apparently this symbol clung to the ctrl+c / ctrl+v while transferring the code.

就我而言,符号"&#8203;"没有出现在代码编辑器 MS Code 中,仅在 Elements Chrome 选项卡中可见。它帮助删除了出现此符号的标签,并再次手写了此标签的重印,显然此符号在传输代码时粘在了 ctrl+c / ctrl+v 上。

回答by Drew McDowell

I was able to remove these in Sublime by selecting the characters surrounding it and copy/pasting into Find and Replace.

我能够通过选择它周围的字符并复制/粘贴到“查找和替换”中来在 Sublime 中删除它们。

回答by Niroshan

This “8203;” HTML character is a no width break control. It can easily find in the Google Chrome Browser inspect elementssection. And When you try to remove it from your code, most of the Major IDE not showing to me...(Maybe by my preference).

这个“8203;” HTML 字符是一个无宽度中断控件。它可以在Google Chrome 浏览器检查元素部分轻松找到。当您尝试从代码中删除它时,大部分主要 IDE 都没有向我显示......(也许是我的喜好)。

I found the new text editor Bracketsdownload it and open my code in the editor. It shows the character with red dots. Just remove it check everything is working well.

我找到了新的文本编辑器Brackets下载它并在编辑器中打开我的代码。它用红点显示字符。只需删除它检查一切正常。

enter image description here

在此处输入图片说明

I found this solution from a blog. What is “8203?” HTML character? Why is being injected into my HTML?

我从博客中找到了这个解决方案。什么是“8203”?HTML 字符?为什么被注入到我的 HTML 中?

Thank You for saving me hours.

谢谢你为我节省了时间。

回答by Sean Schricker

I cannot find where it's being injected on my page. I'll investigate it more later, but for now, I just threw this in my page so I can keep working.

我在我的页面上找不到它被注入的位置。稍后我将对其进行更多调查,但现在,我只是将其放入我的页面中,以便我可以继续工作。

$(function(){
    $('body').contents().eq(0).each(function(){
        if(this.nodeName.toString()=='#text' && this.data.trim().charCodeAt(0)==8203){
            $(this).remove();
        }
    });
});