Html 为什么这个 HTML5 文档无效?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17935819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is this HTML5 document invalid?
提问by Kath Brown
I'm getting pretty confused about an error message I'm getting when I try to validate any simple HTML document without a meta encoding like this:
当我尝试验证任何没有像这样的元编码的简单 HTML 文档时,我对收到的错误消息感到非常困惑:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
</head>
<body>Test</body>
</html>
The W3C validator http://validator.w3.orgreluctantly accepts the document as valid with just a few warnings when it is pasted into the direct input form, but when the document is uploaded or loaded by URI, validation fails with this error message
W3C 验证器http://validator.w3.org在将文档粘贴到直接输入表单中时,不情愿地接受该文档为有效,仅发出一些警告,但是当文档通过 URI 上传或加载时,验证失败并显示此错误消息
The character encoding was not declared. Proceeding using windows-1252.
未声明字符编码。继续使用 windows-1252。
There are two things I don't understand about this error:
关于这个错误,我有两件事不明白:
- Why is a missing character encoding considered an error, when fallback rules exist?
- Why is the validator assuming windows-1252 instead of UTF-8, like any browser would?
- 当存在回退规则时,为什么缺少字符编码会被视为错误?
- 为什么验证器假设 windows-1252 而不是 UTF-8,就像任何浏览器一样?
Can someone explain these two points please? I'm pretty new to this stuff, so please bear with me.
有人能解释一下这两点吗?我对这个东西很陌生,所以请耐心等待。
回答by federico-t
Well, it depends on what you are using.
嗯,这取决于您使用的是什么。
- if you are using the File Uploadoption, it depends on which encoding the HTML file was saved with.
- if you are using the Direct Inputoption, it depends on the navigator.
If you don't want the validator to guess, and use UTF-8, you can add the following line
如果您不想让验证器猜测并使用UTF-8,则可以添加以下行
<meta charset="UTF-8">
inside the the head element.
在head 元素内部。
回答by Andy G
It is the "Direct Input" mode of the validator that defaults to UTF-8. User-agents (browsers) will default to other encodings based on a number of things:
验证器的“直接输入”模式默认为 UTF-8。用户代理(浏览器)将基于许多因素默认使用其他编码:
If a user agent reads a document with no character encoding information, it can fall back to using some other information. For example, it can rely on the user's settings, either browser-wide or specific for a given document, or it can pick a default encoding based on the user's language. For Western European languages, it is typical and fairly safe to assume Windows-1252, which is similar to ISO-8859-1 but has printable characters in place of some control codes.
如果用户代理读取没有字符编码信息的文档,它可以回退到使用其他一些信息。例如,它可以依赖于用户的设置,无论是浏览器范围内的还是特定于给定文档的,或者它可以根据用户的语言选择默认编码。对于西欧语言,假设 Windows-1252 是典型且相当安全的,它类似于 ISO-8859-1,但具有可打印字符代替某些控制代码。
回答by James
W3C validator said:
W3C 验证者说:
The validator checked your document with an experimental feature: HTML5 Conformance Checker. This feature has been made available for your convenience, but be aware that it may be unreliable, or not perfectly up to date with the latest development of some cutting-edge technologies.
验证器使用实验性功能检查您的文档:HTML5 一致性检查器。此功能是为了您的方便而提供的,但请注意,它可能不可靠,或者与某些尖端技术的最新发展不完全一致。
So take some results with a pinch of salt.
因此,用少许盐取得一些结果。
Also, there is no useful 'fall back', the validator just needs to pick something/anything so it can try to validate for you. W3C can't determine/decide what encoding you want/need to use. You must declare it yourself based on what characters you need to serve on your web page(s), and then ask W3C to validate your document based on that.
此外,没有有用的“回退”,验证器只需要选择一些东西/任何东西,以便它可以尝试为您验证。W3C 无法确定/决定您想要/需要使用什么编码。您必须根据需要在网页上提供的字符自行声明它,然后要求 W3C 基于此验证您的文档。
What editor/WYSIWYG are you using to make web pages? Can we have the URL you are trying to validate?
您使用什么编辑器/所见即所得来制作网页?我们可以提供您要验证的 URL 吗?
回答by Jukka K. Korpela
When you use Validate by URI, the server is supposed to announce the character encoding in HTTP headers, more exactly in the charset
parameter of the Content-Type
header value. In this case, this apparently does not happen. You can check the situation e.g. using Rex Swain's HTTP Viewer.
当您使用 URI 验证时,服务器应该在 HTTP 标头中宣布字符编码,更确切地说charset
是在Content-Type
标头值的参数中。在这种情况下,这显然不会发生。您可以检查情况,例如使用Rex Swain 的 HTTP 查看器。
According to clause 4.2.5.5 Specifying the document's character encodingin HTML5 CR, “If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the character encoding used must be an ASCII-compatible character encoding, and the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state.” This is a bit complicated, but the bottom line is: there are several ways to declare the encoding, but if none of them is used, the document is non-conforming.
根据条款4.2.5.5在 HTML5 CR 中指定文档的字符编码,“如果 HTML 文档不以 BOM 开头,并且 Content-Type 元数据未明确给出其编码,并且该文档不是 iframe srcdoc 文档,那么所使用的字符编码必须是与 ASCII 兼容的字符编码,并且必须使用具有 charset 属性的 meta 元素或在编码声明状态中具有 http-equiv 属性的 meta 元素来指定编码。” 这有点复杂,但归根结底是:有几种方法可以声明编码,但是如果没有使用它们,则文档是不合格的。
Whyit specifies so is somewhat speculative, but the general idea is that such rules promote reliability and robustness. When the rule is not obeyed, different browsers may use different defaults or guesswork.
为什么它这么指定有点推测性,但一般的想法是这样的规则提高了可靠性和健壮性。当不遵守规则时,不同的浏览器可能使用不同的默认值或猜测。
The validator assumes windows-1252, because that's what HTML5 rules lead to. The processing rules are in 8.2.2.1 Determining the character encoding. They are fairly complicated, but they largely reflect the way modern browsers do (and aims at making it a standard). The rules there are meant to deal with non-conforming documents, too, but this does not make those documents conforming; error processing rules are not really “fallbacks” and should not be relied on, especially since old browsers do not always play by the rules.
验证器假定 windows-1252,因为这是 HTML5 规则导致的。处理规则见8.2.2.1 确定字符编码。它们相当复杂,但它们在很大程度上反映了现代浏览器的工作方式(旨在使其成为标准)。那里的规则也旨在处理不符合要求的文件,但这并不能使这些文件符合要求;错误处理规则并不是真正的“后备”,不应依赖,特别是因为旧浏览器并不总是遵守规则。
The error rules get somewhat loose when it comes to a situation where everything else fails and an “implementation-defined or user-specified default character encoding” is to be used. There are just “suggestions” on what browsers might do (again, reflecting what modern browsers generally do), and this may involve using the “user's locale”, an obscure concept. The validator uses windows-1252 then, perhaps because that's the default for English and the validator “speaks” English, or maybe just because it's the guess that is expected to be correct more often than any other single alternative.
当涉及到其他一切都失败并且要使用“实现定义或用户指定的默认字符编码”的情况时,错误规则会变得有些松散。关于浏览器可能会做什么(再次反映现代浏览器通常会做什么),只有“建议”,这可能涉及使用“用户的区域设置”,这是一个模糊的概念。验证器然后使用 windows-1252,可能是因为这是英语的默认设置,并且验证器“说”英语,或者可能只是因为预计它的猜测比任何其他单一替代方案更准确。