C# Object.GetHashCode
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1139767/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Object.GetHashCode
提问by ChrisW
My question may duplicate Default implementation for Object.GetHashCode()but I'm asking again because I didn't understand the accepted answer to that one.
我的问题可能会重复Object.GetHashCode() 的默认实现,但我再次询问,因为我不理解该答案的公认答案。
To begin with I have three questions about the accepted answer to the previous question, which quotes some documentationas follows:
首先,我对上一个问题的公认答案提出了三个问题,其中引用了一些文档,如下所示:
"However, because this index can be reused after the object is reclaimed during garbage collection, it is possible to obtain the same hash code for two different objects."
“但是,由于在垃圾收集期间回收对象后可以重用此索引,因此有可能为两个不同的对象获取相同的哈希码。”
Is this true? It seems to me that two objects won't have the same hash code, because an object's code isn't reused until the object is garbage collected (i.e. no longer exists).
这是真的?在我看来,两个对象不会具有相同的哈希码,因为在对象被垃圾收集(即不再存在)之前,不会重用对象的代码。
"Also, two objects that represent the same value have the same hash code only if they are the exact same object."
“此外,仅当它们是完全相同的对象时,表示相同值的两个对象才具有相同的哈希码。”
Is this a problem? For example, I want to associate some data with each of the node instances in a DOM tree. To do this, the 'nodes' must have an identity or hash code, so that I can use them as keys in a dictionary of the data. Isn't a hash code which identities whether it's "the exact same object", i.e. "reference equality rather than "value equality", what I want?
这是一个问题吗?例如,我想将一些数据与 DOM 树中的每个节点实例相关联。为此,“节点”必须具有标识或哈希码,以便我可以将它们用作数据字典中的键。不是标识它是否是“完全相同的对象”的哈希码,即“引用相等性而不是“值相等性”,我想要什么?
"This implementation is not particularly useful for hashing; therefore, derived classes should override GetHashCode"
“这个实现对散列不是特别有用;因此,派生类应该覆盖 GetHashCode”
Is this true? If it's not good for hashing, then what if anything is it good for, and why is it even defined as a method of Object?
这是真的?如果它不利于散列,那么它有什么好处呢,为什么它甚至定义为 Object 的方法?
My final (and perhaps most important to me) question is, if I must invent/override a GetHashCode() implementation for an arbitrary type which has "reference equality" semantics, is the following a reasonable and good implementation:
我的最后一个(也许对我来说最重要的)问题是,如果我必须为具有“引用相等”语义的任意类型发明/覆盖 GetHashCode() 实现,以下是一个合理且良好的实现:
class SomeType
{
//create a new value for each instance
static int s_allocated = 0;
//value associated with this instance
int m_allocated;
//more instance data
... plus other data members ...
//constructor
SomeType()
{
allocated = ++s_allocated;
}
//override GetHashCode
public override int GetHashCode()
{
return m_allocated;
}
}
Edit
编辑
FYI I tested it, using the following code:
仅供参考,我使用以下代码对其进行了测试:
class TestGetHash
{
//default implementation
class First
{
int m_x;
}
//my implementation
class Second
{
static int s_allocated = 0;
int m_allocated;
int m_x;
public Second()
{
m_allocated = ++s_allocated;
}
public override int GetHashCode()
{
return m_allocated;
}
}
//stupid worst-case implementation
class Third
{
int m_x;
public override int GetHashCode()
{
return 0;
}
}
internal static void test()
{
testT<First>(100, 1000);
testT<First>(1000, 100);
testT<Second>(100, 1000);
testT<Second>(1000, 100);
testT<Third>(100, 100);
testT<Third>(1000, 10);
}
static void testT<T>(int objects, int iterations)
where T : new()
{
System.Diagnostics.Stopwatch stopWatch =
System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
{
Dictionary<T, object> dictionary = new Dictionary<T, object>();
for (int j = 0; j < objects; ++j)
{
T t = new T();
dictionary.Add(t, null);
}
for (int k = 0; k < 100; ++k)
{
foreach (T t in dictionary.Keys)
{
object o = dictionary[t];
}
}
}
stopWatch.Stop();
string stopwatchMessage = string.Format(
"Stopwatch: {0} type, {1} objects, {2} iterations, {3} msec",
typeof(T).Name, objects, iterations,
stopWatch.ElapsedMilliseconds);
System.Console.WriteLine(stopwatchMessage);
}
}
On my machine the results/output are as follows:
在我的机器上,结果/输出如下:
First type, 100 objects, 1000 iterations, 2072 msec
First type, 1000 objects, 100 iterations, 2098 msec
Second type, 100 objects, 1000 iterations, 1300 msec
Second type, 1000 objects, 100 iterations, 1319 msec
Third type, 100 objects, 100 iterations, 1487 msec
Third type, 1000 objects, 10 iterations, 13754 msec
My implementation takes half the time of the default implementation (but my type is bigger by the size of my m_allocated data member).
我的实现需要默认实现的一半时间(但我的类型比我的 m_allocated 数据成员的大小更大)。
My implementation and the default implementation both scale linearly.
我的实现和默认实现都是线性扩展的。
In comparison and as a sanity check, the stupid implementation starts bad and scales worse.
相比之下,作为一个健全的检查,愚蠢的实现开始很糟糕,而且规模更糟。
采纳答案by Eric Lippert
The most important property a hash code implementation must have is this:
哈希码实现必须具有的最重要的属性是:
If two objects compare as equal then they must have identical hash codes.
如果两个对象比较相等,则它们必须具有相同的哈希码。
If you have a class where instances of the class compare by reference equality, then you do not need to override GetHashCode; the default implementation guarantees that two objects that are the same reference have the same hash code. (You're calling the same method twice on the same object, so of course the result is the same.)
如果你有一个类的实例通过引用相等来比较,那么你不需要覆盖 GetHashCode;默认实现保证具有相同引用的两个对象具有相同的哈希码。(你在同一个对象上调用同一个方法两次,所以结果当然是一样的。)
If you have written a class which implements its own equality that is different from reference equality then you are REQUIRED to override GetHashCode such that two objects that compare as equal have equal hash codes.
如果您编写的类实现了与引用相等不同的自己的相等,那么您需要覆盖 GetHashCode,以便比较相等的两个对象具有相等的哈希码。
Now, you could do so by simply returning zero every time. That would be a lousy hash function, but it would be legal.
现在,您可以通过每次简单地返回零来实现。这将是一个糟糕的哈希函数,但它是合法的。
Other properties of good hash functions are:
好的散列函数的其他特性是:
GetHashCode should never throw an exception
Mutable objects which compare for equality on their mutable state, and therefore hash on their mutable state, are dangerously bug-prone. You can put an object into a hash table, mutate it, and be unable to get it out again. Try to never hash or compare for equality on mutable state.
GetHashCode should be extremely fast -- remember, the purpose of a good hash algorithm is to improve the performance of lookups. If the hash is slow then the lookups can't be made fast.
Objects which do not compare as equal should have dissimilar hash codes, well distributed over the whole range of a 32 bit integer
GetHashCode 永远不应该抛出异常
在可变状态上比较相等的可变对象,因此在其可变状态上散列,很容易出错。您可以将一个对象放入哈希表中,对其进行变异,并且无法再次将其取出。尽量不要在可变状态上散列或比较相等性。
GetHashCode 应该非常快——记住,一个好的散列算法的目的是提高查找的性能。如果散列很慢,则无法快速进行查找。
不相等的对象应该具有不同的哈希码,在 32 位整数的整个范围内分布良好
回答by Kenan E. K.
You do not actually need to modify anything on a class which requires only referenceequality.
您实际上不需要修改只需要引用相等的类上的任何内容。
Also, formally, that is not a good implementation since it has poor distribution. A hash function should have a reasonable distribution since it improves hash bucket distribution, and indirectly, performance in collections which use hash tables. As I said, that is a formalanswer, one of the guidelines when designing a hash function.
此外,正式地说,这不是一个好的实现,因为它的分布很差。散列函数应该具有合理的分布,因为它改进了散列桶的分布,并间接地提高了使用散列表的集合的性能。正如我所说,这是一个正式的答案,是设计散列函数时的准则之一。
回答by Alex Yakunin
Question:
题:
Is this true? It seems to me that two objects won't have the same hash code, because an object's code isn't reused until the object is garbage collected (i.e. no longer exists).
这是真的?在我看来,两个对象不会具有相同的哈希码,因为在对象被垃圾收集(即不再存在)之前,不会重用对象的代码。
Two objects may share the same hash code, if it is generated by default GetHashCode implementation, because:
两个对象可能共享相同的哈希码,如果它是由默认的 GetHashCode 实现生成的,因为:
- Default GetHashCode result shouldn't be changed during object's lifetime, and default implementation ensures this. If it could change, such types as Hashtable couldn't deal with this implementation. That's because it's expected that default hash code is a hash code of unique instance identifier (even although there is no such identifier :) ).
- Range of GetHashCode values is range of integer (2^32).
- 在对象的生命周期内不应更改默认 GetHashCode 结果,默认实现可确保这一点。如果它可以改变,像 Hashtable 这样的类型无法处理这个实现。那是因为预期默认哈希码是唯一实例标识符的哈希码(即使没有这样的标识符:))。
- GetHashCode 值的范围是整数范围 (2^32)。
Conclusion:It's enough to allocate 2^32 strongly-referenced objects to (must be easy on Win64) to reach the limit.
结论:将 2^32 个强引用对象分配给(在 Win64 上必须很容易)达到限制就足够了。
Finally, there is an explicit statement in object.GetHashCode reference in MSDN: The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
最后,在 MSDN中object.GetHashCode 引用中有一个明确的声明:GetHashCode 方法的默认实现不保证不同对象的唯一返回值。此外,.NET Framework 不保证 GetHashCode 方法的默认实现,并且它返回的值在 .NET Framework 的不同版本之间将是相同的。因此,此方法的默认实现不得用作散列目的的唯一对象标识符。