Html 特殊字符未按预期显示

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32253895/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 18:08:13  来源:igfitidea点击:

Special character not displaying as expected

htmlutf-8

提问by curious1

I have the following simple HTML page:

我有以下简单的 HTML 页面:

<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
    <div>
        méywe
    </div>
</body>
</html>

When displaying it in Chrome or Firefox (I did not test other browsers), I see the following:

在 Chrome 或 Firefox 中显示时(我没有测试其他浏览器),我看到以下内容:

m?ywe

我吗?

What did I miss? The html file is saved in UTF-8 encoding. The server is Apache. My machine is Windows 7 pro. The text editor is UltraEdit.

我错过了什么?html 文件以 UTF-8 编码保存。服务器是Apache。我的机器是Windows 7 pro。文本编辑器是 UltraEdit。

Thanks!

谢谢!

Update

更新

Initially, I used UltraEdit for editing this html file and I got the problem. Based on cmbuckley's input and install of Notepad++ (from Heatmanofurioso's suggestion), I thought about the possibility of my file being corrupt somehow (even though it looks fine in both UltraEdit and Notepad). So I saved my file with Notepad in utf-8 encoding. Still saw the problem (maybe due to cache???). Then I used UltraEdit to save it again. See the page in the browser and the problem is gone.

最初,我使用 UltraEdit 来编辑这个 html 文件,但我遇到了问题。基于cmbuckley的输入和 Notepad++ 的安装(来自Heatmanofurioso的建议),我想到了我的文件以某种方式损坏的可能性(即使它在 UltraEdit 和 Notepad 中看起来都很好)。所以我用记事本以 utf-8 编码保存了我的文件。仍然看到问题(可能是由于缓存???)。然后我用 UltraEdit 再次保存它。在浏览器中查看页面,问题就解决了。

Lesson Learned

学过的知识

Have two text editors if that that is your tool, and try the different one if you see unexplainable problem. No tool is perfect, even though you use one everyday. In my case, Notepad++ fixed the utf8 issue with my file that UltraEdit somehow failed.

如果这是您的工具,请使用两个文本编辑器,如果您看到无法解释的问题,请尝试使用不同的文本编辑器。没有任何工具是完美的,即使您每天都在使用。就我而言,Notepad++ 修复了 UltraEdit 以某种方式失败的文件的 utf8 问题。

Thanks to folks for helping!!!

感谢大佬帮忙!!!

回答by Heatmanofurioso

1 - Replace your

1 - 更换你的

<meta charset="utf-8">

with

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

2 - Check if your HTML Editor's encoding is in UTF8. Usually this option is found on the tabs on the top of the program, like in Notepad++.

2 - 检查您的 HTML 编辑器的编码是否为 UTF8。通常这个选项位于程序顶部的选项卡上,就像在 Notepad++ 中一样。

3 - Check if your browser is compatible with your font, if you're somehow importing a font. Or try and add a css to set your fonts to a default/generally accepted one like

3 - 检查您的浏览器是否与您的字体兼容,如果您以某种方式导入字体。或者尝试添加一个 css 来将您的字体设置为默认/普遍接受的字体,例如

body
{
    font-family: "Times New Roman", Times, serif;
}

Hope it helps :)

希望能帮助到你 :)

回答by Mofi

The reason for having saved the file with Windows-1252encoding (most likely) instead of UTF-8encoding resulting in getting the non ASCIIcharacter displayed wrong in the browsers was missing knowledge about UTF-8 detection by UltraEdit and perhaps also appropriate UTF-8 configuration.

使用Windows-1252编码(最有可能)而不是UTF-8编码保存文件导致非ASCII字符在浏览器中显示错误的原因是缺少有关 UltraEdit 检测 UTF-8 的知识,也许还有适当的 UTF- 8 配置。

How currently latest version 22.10 of UltraEdit detects UTF-8 encoding is explained in detail in user-to-user forum topic UTF-8 not recognized, largish file. This forum topic contains also recommendations on how to configure UltraEdit best for HTML writers who use mainly UTF-8 encoding for all HTML files.

当前最新版本的 UltraEdit 22.10 如何检测 UTF-8 编码在用户到用户论坛主题UTF-8 无法识别,较大文件中有详细说明。此论坛主题还包含有关如何为所有 HTML 文件主要使用 UTF-8 编码的 HTML 编写者最佳配置 UltraEdit 的建议。

Unfortunately the regular expression search used by currently latest UltraEdit v22.10 and previous versions to detect a UTF-8 HTML character set declaration does not work for short HTML5 variant as reported in forum topic Short utf-8 charset declaration in HTML5 header. The reason is the double quote character between charset=and utf-8. I reported this by email to IDM Computer Solutions, Inc. as the referenced topic was created with the suggestion to make the small change in the regular expression to detect also short HTML5 UTF-8 declaration. The UTF-8 detection was updated later by the developers of UltraEdit for UE v24.00 and UES v17.00 as a post on referenced forum topic explains in detail.

不幸的是,当前最新的 UltraEdit v22.10 和以前版本用于检测 UTF-8 HTML 字符集声明的正则表达式搜索不适用于短 HTML5 变体,如论坛主题HTML5 标题中的短 utf-8 字符集声明中所述。原因是charset=和之间的双引号字符utf-8。我通过电子邮件向 IDM Computer Solutions, Inc. 报告了这一点,因为创建了参考主题的建议是对正则表达式进行小的更改以检测短的 HTML5 UTF-8 声明。UTF-8 检测后来由 UltraEdit for UE v24.00 和 UES v17.00 的开发人员更新,作为参考论坛主题的帖子详细解释。

However, when a HTML5 file is declared as UTF-8 encoded, but UltraEdit loaded it as ANSI file, the user can see the wrong loading in the status bar at bottom of main window. A small (less than 64 KB) UTF-8 encoded HTML file should result in getting

但是,当 HTML5 文件被声明为 UTF-8 编码,但 UltraEdit 将其加载为 ANSI 文件时,用户会在主窗口底部的状态栏中看到错误加载。一个小的(小于 64 KB)UTF-8 编码的 HTML 文件应该导致

  • either U8-and line terminator type (DOS/UNIX/MAC) displayed for users of UE < v19.00 or when using basic status bar in later versions of UE
  • or UTF-8selected in encoding selector in status bar for users of UE v19.00 or later versions not using basic status bar.
  • 任一U8-以及用于UE的后续版本使用基本状态栏UE的用户<v19.00或当显示行终止型(DOS / UNIX / MAC)
  • UTF-8在状态栏的编码选择器中为 UE v19.00 或更高版本不使用基本状态栏的用户选择。

If this is not the case, the UltraEdit user can use

如果不是这种情况,UltraEdit 用户可以使用

  • Save Asfrom menu Fileand select UTF-8 - NO BOMfor Encoding(Windows Vista or later) respectively Format(Windows 2000/XP) to convert the file from ANSI to UTF-8 without byte order mark, or
  • ASCII to UTF-8 (Unicode editing)from submenu Conversionsin menu Fileto convert the file from ASCII/ANSI to UTF-8 without an immediate save, or
  • select Unicode - UTF-8via encoding selector in status bar (UE v19.00 or later only) resulting also in an immediate conversion from ASCII/ANSI to UTF-8 and enabling Unicode editing.
  • 另存为从菜单中的文件,并选择UTF-8 - NO BOM用于编码(Windows Vista或更高版本)分别格式(在Windows 2000 / XP)将文件从ANSI转换为UTF-8没有字节顺序标记,或
  • ASCII 到 UTF-8(Unicode 编辑)从菜单文件中的子菜单转换将文件从 ASCII/ANSI 转换为 UTF-8,而无需立即保存,或
  • 通过状态栏中的编码选择器(仅限 UE v19.00 或更高版本)选择Unicode - UTF-8也会立即从 ASCII/ANSI 转换为 UTF-8 并启用 Unicode 编辑。

For the last two options the UTF-8 BOM settings at Advanced - Settingsor Configuration - File Handling - Savedetermine saving the file without or with byte order mark on next save.

对于最后两个选项,高级 - 设置配置 - 文件处理 - 保存中的 UTF-8 BOM 设置确定在下次保存时不带或带字节顺序标记保存文件。

Once the word méyweis saved into the file using UTF-8 encoding resulting in byte stream 6D C3 A9 79 77 65(hexadecimal) which would be displayed as m??ywewhen UTF-8 encoded file is opened in ASCII/ANSI mode (option in File - Open dialog) using Windows-1252 as code page, UltraEdit detects this file on next opening automatically as UTF-8 encoded file although <meta charset="utf-8">is not recognized because there is now at least one UTF-8 encoded character in the first 64 KB of the file.

一旦méywe使用 UTF-8 编码将单词保存到文件中6D C3 A9 79 77 65,就会产生字节流(十六进制),m??ywe当使用 Windows-1252 以 ASCII/ANSI 模式(文件 - 打开对话框中的选项)打开 UTF-8 编码文件时,该字节流将显示作为代码页,UltraEdit 会在下次打开时自动将此文件检测为 UTF-8 编码文件,尽管<meta charset="utf-8">无法识别,因为现在文件的前 64 KB 中至少有一个 UTF-8 编码字符。

To answer the question:

要回答这个问题:

What did I miss?

我错过了什么?

You missed to save the file as UTF-8 encoded file after having it opened or created as ANSI file (or more precise single byte per character encoded text file using a code page) and having it declared as UTF-8 encoded. This is a common problem of many users writing into a HTML file

在将文件打开或创建为 ANSI 文件(或更精确的每个字符编码的文本文件使用代码页)并将其声明为 UTF-8 编码后,您错过了将文件保存为 UTF-8 编码的文件。这是很多用户写入HTML文件的通病

<meta charset="utf-8">

or

或者

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

or

或者

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

or into an XML file

或到一个 XML 文件

<?xml version="1.0" encoding="UTF-8"?>

or

或者

<?xml version="1.0" encoding='utf-8'?>

and other variations depending on usage of 'or "and writing either UTF-8or utf-8(and other spellings) without really knowing what this string means for the applications interpreting the bytes of the file.

和其他变化取决于'or 的用法"和写作UTF-8or utf-8(和其他拼写),而无需真正知道此字符串对于解释文件字节的应用程序意味着什么。

What's the best default new file format?contains lots of useful information and links to web pages with useful information about text encoding, which one to use for which files and how to configure UltraEdit accordingly.

最好的默认新文件格式是什么?包含许多有用的信息和网页链接,其中包含有关文本编码的有用信息,哪个文件用于哪些文件以及如何相应地配置 UltraEdit。

回答by sideshowbarker

Can you check and see if the server is sending a charsetin the Content-typeheader? The encoding specified in that will take precedence over what you specify with the metaelement.

您能检查一下服务器是否charsetContent-type标头中发送 a吗?其中指定的编码将优先于您指定的meta元素。

回答by abhinav1602

Changing font-family to Calibri (or any other generally accepted font) worked for me.

将 font-family 更改为 Calibri(或任何其他普遍接受的字体)对我有用。

Example:

例子:

<span style="font-family:Calibri">&#35; My_Text</span>

回答by Jérémie Gagné

Replace meta charset="utf-8"with meta http-equiv="Content-Type" content="text/html; charset=utf-8". Maybe it will help.

替换meta charset="utf-8"meta http-equiv="Content-Type" content="text/html; charset=utf-8"。也许它会有所帮助。

Otherwise, what is your font?

否则,你的字体是什么?