为什么我应该在 C# 中使用 int 而不是 byte 或 short
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1097467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why should I use int instead of a byte or short in C#
提问by Breadtruck
I have found a few threads in regards to this issue. Most people appear to favor using int in their c# code accross the board even if a byte or smallint would handle the data unless it is a mobile app. I don't understand why. Doesn't it make more sense to define your C# datatype as the same datatype that would be in your data storage solution?
我发现了一些关于这个问题的线索。大多数人似乎都喜欢在他们的 c# 代码中使用 int,即使 byte 或 smallint 会处理数据,除非它是移动应用程序。我不明白为什么。将 C# 数据类型定义为数据存储解决方案中的相同数据类型不是更有意义吗?
My Premise: If I am using a typed dataset, Linq2SQL classes, POCO, one way or another I will run into compiler datatype conversion issues if I don't keep my datatypes in sync across my tiers. I don't really like doing System.Convert all the time just because it was easier to use int accross the board in c# code. I have always used whatever the smallest datatype is needed to handle the data in the database as well as in code, to keep my interface to the database clean. So I would bet 75% of my C# code is using byte or short as opposed to int, because that is what is in the database.
我的前提:如果我使用类型化数据集、Linq2SQL 类、POCO,如果我不让我的数据类型在我的层之间保持同步,我会以一种或另一种方式遇到编译器数据类型转换问题。我真的不喜欢一直做 System.Convert 只是因为在 c# 代码中更容易使用 int 。我总是使用任何需要的最小数据类型来处理数据库和代码中的数据,以保持我的数据库接口干净。所以我敢打赌,我的 C# 代码中有 75% 使用的是 byte 或 short,而不是 int,因为那是数据库中的内容。
Possibilities: Does this mean that most people who just use int for everything in code also use the int datatype for their sql storage datatypes and could care less about the overall size of their database, or do they do system.convert in code wherever applicable?
可能性:这是否意味着大多数只将 int 用于代码中的所有内容的人也将 int 数据类型用于他们的 sql 存储数据类型,并且不太关心他们数据库的整体大小,或者他们是否在适用的情况下在代码中执行 system.convert ?
Why I care: I have worked on my own forever and I just want to be familiar with best practices and standard coding conventions.
我关心的原因:我一直独自工作,我只想熟悉最佳实践和标准编码约定。
采纳答案by jalf
Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.
在性能方面,几乎在所有情况下 int 都更快。CPU 旨在有效处理 32 位值。
Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.
较短的值处理起来很复杂。例如,要读取单个字节,CPU 必须读取包含它的 32 位块,然后屏蔽掉高 24 位。
To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.
要写入一个字节,它必须读取目标 32 位块,用所需的字节值覆盖低 8 位,然后再次写回整个 32 位块。
Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)
当然,在空间方面,您可以通过使用较小的数据类型来节省一些字节。因此,如果您正在构建一个包含几百万行的表,那么较短的数据类型可能值得考虑。(同样可能是您应该在数据库中使用较小数据类型的好理由)
And correctness-wise, an int doesn't overflow easily. What if you thinkyour value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?
在正确性方面, int 不会轻易溢出。如果您认为您的值将适合一个字节,然后在将来的某个时刻对代码进行一些看似无害的更改意味着将更大的值存储到其中怎么办?
Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.
这些就是为什么 int 应该成为所有整数数据的默认数据类型的一些原因。如果您确实要存储机器字节,则仅使用字节。如果您正在处理实际指定 16 位整数值的文件格式或协议或类似内容,请仅使用 shorts。如果您只是一般处理整数,请将它们设为整数。
回答by Mitch Wheat
For the most part, 'No'.
大多数情况下,“不”。
Unless you know upfront that you are going to be dealing with 100's of millions of rows, it's a micro-optimisation.
除非您预先知道您将处理数百万行中的 100 行,否则这是一种微优化。
Do what fits the Domain model best. Later, if you have performance problems, benchmark and profile to pin-point where they are occuring.
做最适合领域模型的事情。稍后,如果您遇到性能问题,请使用基准测试和配置文件来确定它们发生的位置。
回答by Robert Harvey
If int is used everywhere, no casting or conversions are required. That is a bigger bang for the buck than the memory you will save by using multiple integer sizes.
如果在任何地方都使用 int,则不需要强制转换或转换。这比使用多个整数大小节省的内存更大。
It just makes life simpler.
它只是让生活更简单。
回答by Dan Diplo
The .NET runtime is optimised for Int32. See previous discussion at .NET Integer vs Int16?
.NET 运行时针对 Int32 进行了优化。请参阅.NET Integer vs Int16之前的讨论?
回答by Jon Grant
You would have to be dealing with a few BILLION rows before this makes any significant difference in terms of storage capacity. Lets say you have three columns, and instead of using a byte-equivalent database type, you use an int-equivalent.
在这对存储容量产生任何显着差异之前,您必须处理几十亿行。假设您有三列,而不是使用等效字节的数据库类型,而是使用等效的整型。
That gives us 3 (columns) x 3 (bytes extra) per row, or 9 bytes per row.
这给了我们每行 3(列)x 3(额外字节),或每行 9 个字节。
This means, for "a few million rows" (lets say three million), you are consuming a whole extra 27 megabytes of disk space! Fortunately as we're no longer living in the 1970s, you shouldn't have to worry about this :)
这意味着,对于“几百万行”(比方说三百万行),您将额外消耗 27 兆字节的磁盘空间!幸运的是,由于我们不再生活在 1970 年代,您不必担心这一点:)
As said above, stop micro-optimising - the performance hit in converting to/from different integer-like numeric types is going to hit you much, much harder than the bandwidth/diskspace costs, unless you are dealing with very, very, very large datasets.
如上所述,停止微优化 - 转换为不同的类似整数的数字类型时的性能损失将比带宽/磁盘空间成本更严重,除非您处理非常、非常、非常大数据集。
回答by Breadtruck
Not that I didn't believe Jon Grant and others, but I had to see for myself with our "million row table". The table has 1,018,000. I converted 11 tinyint columns and 6 smallint columns into int, there were already 5 int & 3 smalldatetimes. 4 different indexes used a combo of the various data types, but obviously the new indexes are now all using int columns.
并不是说我不相信 Jon Grant 和其他人,而是我必须亲眼看看我们的“百万行表”。该表有 1,018,000。我将 11 个 tinyint 列和 6 个 smallint 列转换为 int,已经有 5 个 int 和 3 个 smalldatetimes。4 个不同的索引使用了各种数据类型的组合,但显然新索引现在都使用 int 列。
Making the changes only cost me 40 mb calculating base table disk usage with no indexes. When I added the indexes back in the overall change was only 30 mb difference overall. So I was suprised because I thought the index size would be larger.
进行更改仅花费我 40 mb 来计算没有索引的基表磁盘使用情况。当我在整体更改中添加索引时,总体上只有 30 mb 的差异。所以我很惊讶,因为我认为索引大小会更大。
So is 30 mb worth the hassle of using all the different data types, No Way! I am off to INT land, thanks everyone for setting this anal retentive programmer back on the straight and happy blissful life of no more integer conversions...yippeee!
所以 30 mb 值得使用所有不同的数据类型的麻烦,没办法!我要去 INT 土地了,感谢大家让这个肛门保持型程序员回到没有更多整数转换的直接和幸福的幸福生活......yippeee!
回答by Sunsetquest
I am only 6 years late but maybe I can help someone else.
我只迟到了 6 年,但也许我可以帮助别人。
Here are some guidelines I would use:
以下是我会使用的一些准则:
- If there is a possibility the data will not fit in the future then use the larger int type.
- If the variable is used as a struct/class field then by default it will be padded to take up the whole 32-bits anyway so using byte/int16 will not save memory.
- If the variable is short lived then (like inside a function) then the smaller data types will not help much.
- "byte" or "char" can sometimes describe the data better and can do compile time checking to make sure larger values are not assigned to it on accident. e.g. If storing the day of the month(1-31) using a byte and try to assign 1000 to it then it will cause an error.
- If the variable is used in an array of roughly 100 or more I would use the smaller data type as long as it makes sense.
- byte and int16 arrays are not as thread safe as an int (a primitive).
- 如果将来数据可能不适合,则使用较大的 int 类型。
- 如果变量用作结构/类字段,则默认情况下它将被填充以占用整个 32 位,因此使用 byte/int16 不会节省内存。
- 如果变量是短暂的(就像在函数内部一样),那么较小的数据类型将无济于事。
- "byte" 或 "char" 有时可以更好地描述数据,并且可以进行编译时检查以确保不会意外地为其分配更大的值。例如,如果使用字节存储月份中的日期(1-31)并尝试为其分配 1000,则会导致错误。
- 如果变量用于大约 100 个或更多的数组,我会使用较小的数据类型,只要它有意义。
- byte 和 int16 数组不像 int (基元)那样线程安全。
One topic that no one brought up is the limited CPU cache. Smaller programs execute faster then larger ones because the CPU can fit more of the program in the faster L1/L2/L3 caches.
一个没有人提起的话题是有限的 CPU 缓存。较小的程序比较大的程序执行得更快,因为 CPU 可以在更快的 L1/L2/L3 缓存中容纳更多的程序。
Using the int type can result in fewer CPU instructions however it will also force a higher percentage of the data memory to not fit in the CPU cache. Instructions are cheap to execute. Modern CPU cores can execute 3-7 instructions per clock cycle however a single cache miss on the other hand can cost 1000-2000 clock cycles because it has to go all the way to RAM.
使用 int 类型会导致更少的 CPU 指令,但它也会迫使更高百分比的数据内存无法放入 CPU 缓存中。指令的执行成本很低。现代 CPU 内核每个时钟周期可以执行 3-7 条指令,但另一方面,单个缓存未命中可能会花费 1000-2000 个时钟周期,因为它必须一直运行到 RAM。
When memory is conserved it also results in the rest of the application performing better because it is not squeezed out of the cache.
当内存被节省时,它也会导致应用程序的其余部分性能更好,因为它不会被挤出缓存。
I did a quick sum test with accessing random data in random order using both a byte array and an int array.
我使用字节数组和整数数组以随机顺序访问随机数据进行了快速求和测试。
const int SIZE = 10000000, LOOPS = 80000;
byte[] array = Enumerable.Repeat(0, SIZE).Select(i => (byte)r.Next(10)).ToArray();
int[] visitOrder = Enumerable.Repeat(0, LOOPS).Select(i => r.Next(SIZE)).ToArray();
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int sum = 0;
foreach (int v in visitOrder)
sum += array[v];
sw.Stop();
Here are the results in time(ticks): (x86, release mode, without debugger, .NET 4.5, I7-3930k) (smaller is better)
以下是时间结果(滴答):(x86,发布模式,无调试器,.NET 4.5,I7-3930k)(越小越好)
________________ Array Size __________________
10 100 1K 10K 100K 1M 10M
byte: 549 559 552 552 568 632 3041
int : 549 566 552 562 590 1803 4206
- accessing 1M items randomly using byte on my CPU had a 285% performance increase!
- Anything under 10,000 was hardly noticeable.
- int was never faster then byte for this basic sum test.
- These values will vary with different CPUs with different cache sizes.
- 在我的 CPU 上使用字节随机访问 100 万个项目的性能提高了 285%!
- 任何低于 10,000 的值几乎都不会引起注意。
- 对于这个基本的总和测试,int 永远不会比字节快。
- 这些值会因具有不同缓存大小的不同 CPU 而异。
One final note, Sometimes I look at the now open-source .NET framework to see what Microsoft's experts do. The .NET framework uses byte/int16 surprisingly little. I could not find any actually.
最后一点,有时我会查看现在开源的 .NET 框架,以了解 Microsoft 的专家是做什么的。.NET 框架使用 byte/int16 出奇地少。我实际上找不到任何东西。