C#中char类型的大小

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2134002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 23:46:06  来源:igfitidea点击:

size of char type in c#

c#.netcharacter-encoding

提问by Manish Basantani

Just wondering why do we have chartype of 2 bytes size in C# (.NET) unlike 1 byte in other programming languages?

只是想知道为什么我们char在 C# (.NET) 中有2 字节大小的类型,而不是其他编程语言中的 1 字节大小?

采纳答案by Jan Jongboom

A char is unicode in C#, therefore the number of possible characters exceeds 255. So you'll need two bytes.

char 在 C# 中是 unicode,因此可能的字符数超过 255。所以你需要两个字节。

Extended ASCII for example has a 255-char set, and can therefore be stored in one single byte. That's also the whole purpose of the System.Encodingnamespace, as different systems can have different charsets, and char sizes. C# can therefore handle one/four/etc. char bytes, but Unicode UTF-16 is default.

例如,扩展 ASCII 有 255 个字符集,因此可以存储在单个字节中。这也是System.Encoding命名空间的全部目的,因为不同的系统可以有不同的字符集和字符大小。因此,C# 可以处理一/四等。char 字节,但 Unicode UTF-16 是默认值。

回答by Dawid Ohia

Because strings in .NET are encoded as 2 byte Unicode charactes.

因为 .NET 中的字符串被编码为 2 字节 Unicode 字符。

回答by JaredPar

Actually C#, or more accurately the CLR's, size of char is consistent with most other managed languages. Managed languages, like Java, tend to be newer and have items like unicode support built in from the ground up. The natural extension of supporting unicode strings is to have unicode char's.

实际上,C#,或者更准确地说是 CLR,char 的大小与大多数其他托管语言一致。托管语言(如 Java)往往更新,并且具有从头开始内置的 unicode 支持等项目。支持 unicode 字符串的自然扩展是拥有 unicode 字符。

Older languages like C/C++ started in ASCII only and only later added unicode support.

C/C++ 等较旧的语言仅以 ASCII 开始,后来才添加了 unicode 支持。

回答by Joey

I'm guessing with “other programming languages”you mean C. C has actually two different chartypes: charand wchar_t. charmay be one byte long, wchar_tnot necessarily.

我猜“其他编程语言”是指 C。C 实际上有两种不同的char类型:charwchar_t. char可能是一个字节长,wchar_t不一定。

In C# (and .NET) for that matter, all character strings are encoded as Unicode in UTF-16. That's why a charin .NET represents a single UTF-16 code unitwhich may be a code pointor half of a surrogate pair (not actually a character, then).

在 C#(和 .NET)中,所有字符串都被编码为 UTF-16 中的 Unicode。这就是为什么char.NET 中的 a 表示单个 UTF-16代码单元,它可能是一个代码点或代理对的一半(那么实际上不是一个字符)。

回答by Bob Moore

Because a character in a C# string defaults to the UTF-16 encoding of Unicode, which is 2 bytes (by default).

因为 C# 字符串中的字符默认为 Unicode 的 UTF-16 编码,即 2 个字节(默认情况下)。

回答by kervin

C# using 16 bit character width probably has more to do with performance rather than anything else.

使用 16 位字符宽度的 C# 可能更多地与性能有关,而不是其他任何事情。

Firstly if you use UTF-8 you can fit every character in the "right" amount of space. This is because UTF-8 is variable width. ASCII chars will use 8 bits while larger characters will use more.

首先,如果您使用 UTF-8,您可以在“正确”的空间量中容纳每个字符。这是因为 UTF-8 是可变宽度的。ASCII 字符将使用 8 位,而较大的字符将使用更多。

But variable length character encoding encourages a O(n)algorithm complexity in common scenarios. E.g. Retrieving a character at a particular location in a string. There have been public discussions on this point. But the simplest solution is to continue using a character width that fits most of your charset, truncating the others. Now you have a fixed character width.

但是可变长度字符编码在常见场景中鼓励O(n)算法复杂度。例如,检索字符串中特定位置的字符。关于这一点已经有公开的讨论。但最简单的解决方案是继续使用适合您的大部分字符集的字符宽度,截断其他字符集。现在你有一个固定的字符宽度。

Strictly speaking, UTF-16 is also a variable width encoding, so C# ( and Java for that matter ) are using something of a hybrid since their character widths are never 32 bits.

严格来说,UTF-16 也是一种可变宽度编码,因此 C#(和 Java)正在使用混合的东西,因为它们的字符宽度永远不会是 32 位。