C# 如何读取包含特殊字符的 ANSI 编码文件

Question

提问by Enyra

I'm writing a TFS Checkin policy, which checks if our source files containing our file header.

我正在编写一个 TFS 签入策略，它检查我们的源文件是否包含我们的文件头。

My problem is, that our file header contains a special character "?" and unfortunately some of our source files are encoded in ANSI. So if I read these files in the policy, the string looks like this "Copyright ? 2009".

我的问题是，我们的文件头包含一个特殊字符“？” 不幸的是，我们的一些源文件是用 ANSI 编码的。因此，如果我在策略中读取这些文件，字符串看起来像这样“Copyright ? 2009”。

string content = File.ReadAllText(pendingChange.LocalItem);

I tired to change the encoding of the string, but it does not help. So how can I read these files, that I get the correct string "Copyright ? 2009"?

我厌倦了更改字符串的编码，但这无济于事。那么我怎样才能读取这些文件，从而获得正确的字符串“Copyright ? 2009”？

Answer 1

采纳答案by Jon Skeet

Use Encoding.Default:

使用Encoding.Default：

string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default);

You should be aware, however, that that reads it using the system default encoding - which may not be the same as the encoding of the file. There's no single encoding called ANSI, but usuallywhen people talk about "the ANSI encoding" they mean Windows Code Page 1252 or whatever their box happens to use.

但是，您应该知道，它使用系统默认编码读取它 - 这可能与文件的编码不同。没有称为 ANSI 的单一编码，但通常当人们谈论“ANSI 编码”时，他们指的是 Windows 代码页 1252 或他们的盒子碰巧使用的任何东西。

Your code will be more robust if you can find out the exactencoding used.

如果您能找出所使用的确切编码，您的代码将更加健壮。

Answer 2

回答by AnthonyWJones

It would seem sensible if you going to have such policies that you would also have team agreed standard encoding. To be honest, I can't see why any team would use an encoding other than "Unicode (UtF-8 with signature) - Codepage 65001" (except perhaps for ASPX pages with significant non-latin static content but even then I can't see how it would be a big deal to use UTF-8).

如果您要制定这样的政策，让团队同意标准编码，这似乎是明智的。老实说，我不明白为什么任何团队会使用“Unicode（带签名的 UtF-8）-代码页 65001”以外的编码（可能除了具有重要非拉丁静态内容的 ASPX 页面，但即便如此我也不能t 看看使用 UTF-8 会有什么大不了的）。

Assuming you still want to allow mixed encodings then you next need a way to determine which encoding a file was save in so you know which encoding to pass to ReadAllText. Its not easy to determine this from the file however using Encoding.Defaultis likely to work ok. Since its most likely you have just 2 encodings to deal with, the VS (UTF-8 with signature) and a common ANSI encoding used by you machines (probably Windows-1252).

假设您仍然希望允许混合编码，那么接下来您需要一种方法来确定文件保存在哪种编码中，以便您知道要传递给ReadAllText. 从文件中确定这一点并不容易，但是使用Encoding.Default可能可以正常工作。因为它很可能只有 2 种编码要处理，VS（带有签名的 UTF-8）和您的机器使用的常见 ANSI 编码（可能是 Windows-1252）。

Hence using

因此使用

 string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default);

will work. (As I see Jon has already posted). This works because when the UTF-8 BOM (which is what VS means by the term "signature") is present at the start of the file the supplied encoding parameter is ignored and UTF-8 is used anyway. Hence where the file is saved using UTF-8 you get correct results and where ANSI is used you are most likely also to get correct results.

将工作。（正如我所见，Jon 已经发布了）。这是有效的，因为当 UTF-8 BOM（这是 VS 术语“签名”的意思）出现在文件的开头时，提供的编码参数将被忽略，并且无论如何都会使用 UTF-8。因此，在使用 UTF-8 保存文件的地方，您会得到正确的结果，而在使用 ANSI 的地方，您也最有可能得到正确的结果。

BTW if you are processing file headers wouldn't ReadAllLinesmake things easier?.

顺便说一句，如果您正在处理文件头不会ReadAllLines让事情变得更容易吗？。

C# 如何读取包含特殊字符的 ANSI 编码文件

提问by Enyra

采纳答案by Jon Skeet

回答by AnthonyWJones

相关推荐

最近更新

标签

C# 如何读取包含特殊字符的 ANSI 编码文件

提问by Enyra

采纳答案by Jon Skeet

回答by AnthonyWJones

相关推荐

C# 当它被抛出并被捕获时，不要在那个异常处停止调试器

C# winform中的列表框选定项目

C# 受约束的泛型类型参数的继承

C# .NET 中的双乘法被破坏了吗？

相关推荐

最近更新

标签