C# 快速简单的哈希码组合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1646807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 19:39:47  来源:igfitidea点击:

Quick and Simple Hash Code Combinations

c#algorithmhashhashcode

提问by RobV

Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.

人们能否推荐快速简单的方法来组合两个对象的哈希码。我不太担心冲突,因为我有一个哈希表可以有效地处理冲突,我只想要一些能尽快生成代码的东西。

Reading around SO and the web there seem to be a few main candidates:

阅读 SO 和网络,似乎有几个主要的候选人:

  1. XORing
  2. XORing with Prime Multiplication
  3. Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
  4. Building a String and then using the String classes Hash Code method
  1. 异或
  2. 质数乘法异或
  3. 简单的数字运算,如乘法/除法(带有溢出检查或环绕)
  4. 构建一个字符串,然后使用字符串类哈希代码方法

What would people recommend and why?

人们会推荐什么,为什么?

采纳答案by Jon Skeet

I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I havedeliberately used it for set hashing - if you want to hash a sequence of items and you don'tcare about the ordering, it's nice.

我个人会避免 XOR - 这意味着任何两个相等的值都会导致 0 - 所以 hash(1, 1) == hash(2, 2) == hash(3, 3) 等等。还有 hash(5, 0) == hash(0, 5) 等可能偶尔出现。我已经刻意用它集合散列-如果你想哈希项目的顺序,你关心的排序,这是不错的。

I usually use:

我通常使用:

unchecked
{
    int hash = 17;
    hash = hash * 31 + firstField.GetHashCode();
    hash = hash * 31 + secondField.GetHashCode();
    return hash;
}

That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.

这就是 Josh Bloch 在 Effective Java 中建议的形式。上次我回答了一个类似的问题时,我设法找到了一篇详细讨论的文章 - IIRC,没有人真正知道为什么它运行良好,但确实如此。它也易于记忆、易于实现,并且易于扩展到任意数量的领域。

回答by geofftnz

If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.

如果您的输入哈希大小相同、分布均匀且彼此不相关,那么 XOR 应该没问题。而且速度很快。

The situation I'm suggesting this for is where you want to do

我建议的情况是你想要做的

H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.

of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.

当然,如果可以预期 A 和 B 以合理(不可忽略)的概率散列到相同的值,那么您不应该以这种方式使用 XOR。

回答by richardtallent

I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

我建议使用 System.Security.Cryptography 中的内置哈希函数,而不是使用自己的哈希函数。

回答by Ed Power

If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:

如果您正在寻找速度并且没有太多冲突,那么 XOR 是最快的。为了防止在零附近聚集,您可以执行以下操作:

finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;

Of course, some prototyping ought to give you an idea of performance and clustering.

当然,一些原型设计应该让您了解性能和集群。

回答by Special Sauce

While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17and factor of 31as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue, and the number of items being jointly hashed are a few dozen or less.

虽然 Jon Skeet 的答案中概述的模板通常作为散列函数系列运行良好,但常量的选择很重要,17并且31答案中提到的种子和因子对于常见用例根本不起作用。在大多数用例中,散列值比 更接近于零int.MaxValue,并且联合散列的项目数量只有几十个或更少。

For hashing an integer tuple {x, y}where -1000 <= x <= 1000and -1000 <= y <= 1000, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}, {1, 1} -> {0, 32}, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25, it does less terrible with a collision rate of about 38%. But we can do much better.

对于散列一个整数元组{x, y}where-1000 <= x <= 1000-1000 <= y <= 1000,它的碰撞率几乎达到 98.5%。例如,{1, 0} -> {0, 31}{1, 1} -> {0, 32}等。如果我们将覆盖范围扩大到还包括 n 元组 where 3 <= n <= 25,它的碰撞率大约为 38% 就不那么可怕了。但我们可以做得更好。

public static int CustomHash(int seed, int factor, params int[] vals)
{
    int hash = seed;
    foreach (int i in vals)
    {
        hash = (hash * factor) + i;
    }
    return hash;
}

I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i. Allowed ranges were 2 <= n <= 25(where nwas random but biased toward the lower end of the range) and -1000 <= i <= 1000. At least 12 million unique collision tests were performed for each seed and factor pair.

我编写了一个 Monte Carlo 采样搜索循环,它用各种随机整数的随机 n 元组上的种子和因子的各种值测试了上述方法i。允许的范围是2 <= n <= 25(其中n是随机的,但偏向于范围的下限)和-1000 <= i <= 1000。对每个种子和因子对进行了至少 1200 万次独特的碰撞测试。

After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009, factor = 9176, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common intand charhashing scenarios. It also seems to work fine with integers of much greater magnitudes.

运行大约 7 小时后,找到的最佳对(其中种子和因子都限制在 4 位或更少)是:seed = 1009, factor = 9176,碰撞率为 0.1131%。在 5 位数和 6 位数区域,还有更好的选择。但为了简洁起见,我选择了前 4 位执行者,它在所有常见intchar散列场景中都表现得很好。它似乎也适用于更大数量级的整数。

It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009noted above is in fact prime, but 9176is not. I explicitly tested variations on this where I changed factorto various primes near 9176(while leaving seed = 1009) and they all performed worse than the above solution.

值得注意的是,“成为最好的”似乎并不是作为种子和/或因子的良好表现的一般先决条件,尽管它可能有所帮助。1009上面提到的实际上是素数,但9176不是。我明确地测试了这方面的变化,我factor在附近9176(离开时seed = 1009)更改为各种素数,它们的表现都比上述解决方案差。

Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i;and the original CustomHash()as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.

最后,我还与上面提到的通用 ReSharper 推荐函数系列进行hash = (hash * factor) ^ i;了比较,原始版本的CustomHash()性能明显优于它。对于常见的用例假设,ReSharper XOR 样式的冲突率似乎在 20-30% 范围内,我认为不应该使用。

回答by Stipo

I presume that .NET Framework team did a decent job in testing their System.String.GetHashCode()implementation, so I would use it:

我认为 .NET Framework 团队在测试他们的System.String.GetHashCode()实现方面做得不错,所以我会使用它:

// System.String.GetHashCode(): http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4
// System.Web.Util.StringUtil.GetStringHashCode(System.String): http://referencesource.microsoft.com/#System.Web/Util/StringUtil.cs,c97063570b4e791a
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int i = 0;
    foreach (var hashCode in hashCodes)
    {
        if (i % 2 == 0)
            hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hashCode;
        else
            hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ hashCode;

        ++i;
    }

    return hash1 + (hash2 * 1566083941);
}

Another implementation is from System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32)and System.Array.CombineHashCodes(System.Int32, System.Int32)methods. This one is simpler, but probably doesn't have such a good distribution as the method above:

另一个实现来自System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32)System.Array.CombineHashCodes(System.Int32, System.Int32)方法。这个更简单,但可能没有上述方法那么好的分布:

// System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,21fb74ad8bb43f6b
// System.Array.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#mscorlib/system/array.cs,87d117c8cc772cca
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash = 5381;

    foreach (var hashCode in hashCodes)
        hash = ((hash << 5) + hash) ^ hashCode;

    return hash;
}

回答by Yepeekai

Use the combination logic in tuple. The example is using c#7 tuples.

在元组中使用组合逻辑。该示例使用 c#7 元组。

(field1, field2).GetHashCode();

回答by chwarr

If you are using .NET Core 2.1or later, consider using the System.HashCodestruct to help with producing composite hash codes. It has two modes of operation: Add and Combine.

如果您使用 .NET Core 2.1或更高版本,请考虑使用System.HashCode结构来帮助生成复合哈希码。它有两种操作模式:添加和组合。

An example using Combine, which is usually simpler and works for up to eight items:

使用 的示例Combine,通常更简单,最多可用于八个项目:

public override int GetHashCode()
{
    return HashCode.Combine(object1, object2);
}

An example of using Add:

使用示例Add

public override int GetHashCode()
{
    var hash = new HashCode();
    hash.Add(this.object1);
    hash.Add(this.object2);
    return hash.ToHashCode();
}

Pros:

优点:

  • Part of .NET itself, as of .NET Core 2.1/.NET Standard 2.1 (though, see con below)
  • Looks to have good performance and mixing characteristics, based on the work the author and the reviewers did before merging this into the corefx repo
  • Handles nulls automatically
  • Overloads that take IEqualityComparerinstances
  • .NET 本身的一部分,从 .NET Core 2.1/.NET Standard 2.1 开始(不过,请参阅下面的 con)
  • 基于作者和审阅者在将其合并到 corefx 存储库之前所做的工作,看起来具有良好的性能和混合特性
  • 自动处理空值
  • IEqualityComparer实例的重载

Cons:

缺点:

回答by Thomas Hugel

Assuming you have a relevant toString() function (where your different fields shall appear), I would just return its hashcode:

假设你有一个相关的 toString() 函数(你的不同字段会出现在那里),我只会返回它的哈希码:

this.toString().hashCode();

This is not very fast, but it should avoid collisions quite well.

这不是很快,但它应该可以很好地避免碰撞。

回答by 3dGrabber

This is a repackaging of Special Sauce's brilliantly researched solution.
It makes use of Value Tuples (ITuple).
This allows defaults for the parameters seedand factor.

这是Special Sauce出色研究解决方案的重新包装。
它使用值元组 ( ITuple)。
这允许参数seed和 的默认值factor

public static int CombineHashes(this ITuple tupled, int seed=1009, int factor=9176)
{
    var hash = seed;

    for (var i = 0; i < tupled.Length; i++)
    {
        unchecked
        {
            hash = hash * factor + tupled[i].GetHashCode();
        }
    }

    return hash;
}

Usage:

用法:

var hash1 = ("Foo", "Bar", 42).CombineHashes();    
var hash2 = ("Jon", "Skeet", "Constants").CombineHashes(seed=17, factor=31);