C# 什么时候应该使用 HashSet<T> 类型?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1247442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 14:12:07  来源:igfitidea点击:

When should I use the HashSet<T> type?

c#.netdata-structureshashset

提问by Joan Venge

I am exploring the HashSet<T>type, but I don't understand where it stands in collections.

我正在探索这种HashSet<T>类型,但我不明白它在集合中的位置。

Can one use it to replace a List<T>? I imagine the performance of a HashSet<T>to be better, but I couldn't see individual access to its elements.

可以用它代替aList<T>吗?我想象 a 的性能HashSet<T>会更好,但我看不到对其元素的个人访问。

Is it only for enumeration?

是否仅用于枚举?

采纳答案by Robert Rossney

The important thing about HashSet<T>is right there in the name: it's a set. The only things you can do with a single set is to establish what its members are, and to check whether an item is a member.

重要的事情HashSet<T>就在名称中:它是一个set。您可以对单个集合做的唯一事情是确定其成员是什么,并检查项目是否是成员。

Asking if you can retrieve a single element (e.g. set[45]) is misunderstanding the concept of the set. There's no such thing as the 45th element of a set. Items in a set have no ordering. The sets {1, 2, 3} and {2, 3, 1} are identical in every respect because they have the same membership, and membership is all that matters.

询问您是否可以检索单个元素(例如set[45])是对集合概念的误解。没有集合的第 45 个元素这样的东西。集合中的项目没有顺序。集合 {1, 2, 3} 和 {2, 3, 1} 在各方面都是相同的,因为它们具有相同的成员资格,而成员资格才是最重要的。

It's somewhat dangerous to iterate over a HashSet<T>because doing so imposes an order on the items in the set. That order is not really a property of the set. You should not rely on it. If ordering of the items in a collection is important to you, that collection isn't a set.

迭代 a 有点危险,HashSet<T>因为这样做会对集合中的项目强加一个顺序。该顺序并不是该集合的真正属性。你不应该依赖它。如果集合中项目的排序对您很重要,则该集合不是集合。

Sets are really limited and with unique members. On the other hand, they're really fast.

套装真的很有限,而且有独特的成员。另一方面,他们真的很快。

回答by earl

HashSet is a setimplemented by hashing. A set is a collection of values containing no duplicate elements. The values in a set are also typically unordered. So no, a set can not be used to replace a list (unless you should've use a set in the first place).

HashSet 是通过哈希实现的集合。集合是不包含重复元素的值的集合。集合中的值通常也是无序的。所以不,不能使用集合来替换列表(除非您首先应该使用集合)。

If you're wondering what a set might be good for: anywhere you want to get rid of duplicates, obviously. As a slightly contrived example, let's say you have a list of 10.000 revisions of a software projects, and you want to find out how many people contributed to that project. You could use a Set<string>and iterate over the list of revisions and add each revision's author to the set. Once you're done iterating, the size of the set is the answer you were looking for.

如果你想知道一个集合有什么好处:显然,你想摆脱重复的任何地方。作为一个稍微做作的例子,假设您有一个软件项目的 10.000 个修订的列表,并且您想找出有多少人为该项目做出了贡献。您可以使用 aSet<string>并遍历修订列表并将每个修订的作者添加到集合中。完成迭代后,集合的大小就是您要寻找的答案。

回答by Carl Manaster

Performance would be a bad reason to choose HashSet over List. Instead, what better captures your intent? If order is important, then Set (or HashSet) is out. If duplicates are permitted, likewise. But there are plenty of circumstances when we don't care about order, and we'd rather not have duplicates - and that's when you want a Set.

性能不是选择 HashSet 而不是 List 的坏理由。相反,有什么能更好地捕捉您的意图?如果顺序很重要,那么 Set(或 HashSet)就出局了。如果允许重复,同样如此。但是在很多情况下我们不关心顺序,我们宁愿没有重复——这就是你想要一个 Set 的时候。

回答by Noldorin

HashSet<T>is a data strucutre in the .NET framework that is a capable of representing a mathematical setas an object. In this case, it uses hash codes (the GetHashCoderesult of each item) to compare equality of set elements.

HashSet<T>是 .NET 框架中的一种数据结构,能够将数学集表示为对象。在这种情况下,它使用哈希码(GetHashCode每个项目的结果)来比较集合元素的相等性。

A set differs from a list in that it only allows one occurrence of the same element contained within it. HashSet<T>will just return falseif you try to add a second identical element. Indeed, lookup of elements is very quick (O(1)time), since the internal data structure is simply a hashtable.

集合与列表的不同之处在于它只允许包含在其中的相同元素出现一次。如果您尝试添加第二个相同的元素,HashSet<T>则只会返回false。事实上,元素的查找非常快(O(1)时间),因为内部数据结构只是一个哈希表。

If you're wondering which to use, note that using a List<T>where HashSet<T>is appropiate is not the biggest mistake, though it may potentially allow problems where you have undesirable duplicate items in your collection. What is more, lookup (item retrieval) is vastly more efficient - ideally O(1)(for perfect bucketing) instead of O(n)time - which is quite important in many scenarios.

如果您想知道使用哪个,请注意,使用合适的List<T>whereHashSet<T>并不是最大的错误,尽管它可能会导致您的集合中有不受欢迎的重复项的问题。更重要的是,查找(项目检索)的效率要高得多——理想情况下O(1)(对于完美的分桶)而不是O(n)时间——这在许多情况下都非常重要。

回答by Steve Guidi

List<T>is used to store ordered sets of information. If you know the relative order of the elements of the list, you can access them in constant time. However, to determine where an element lies in the list or to check if it exists in the list, the lookup time is linear. On the other hand, HashedSet<T>makes no guarantees of the order of the stored data and consequently provides constant access time for its elements.

List<T>用于存储有序的信息集。如果您知道列表元素的相对顺序,您就可以在常数时间内访问它们。但是,要确定元素在列表中的位置或检查它是否存在于列表中,查找时间是线性的。另一方面,HashedSet<T>不保证存储数据的顺序,因此为其元素提供恒定的访问时间。

As the name implies, HashedSet<T>is a data structure that implements set semantics. The data structure is optimized to implement set operations (i.e. Union, Difference, Intersect), which can not be done as efficiently with the traditional List implementation.

顾名思义,HashedSet<T>是一种实现集合语义的数据结构。数据结构被优化以实现集合操作(​​即联合、差分、相交),这是传统列表实现无法高效完成的。

So, to choose which data type to use really depends on what your are attempting to do with your application. If you don't care about how your elements are ordered in a collection, and only want to enumarate or check for existence, use HashSet<T>. Otherwise, consider using List<T>or another suitable data structure.

因此,选择要使用的数据类型实际上取决于您尝试对应用程序执行的操作。如果您不关心元素在集合中的排序方式,而只想枚举或检查是否存在,请使用HashSet<T>. 否则,请考虑使用List<T>或其他合适的数据结构。

回答by sepp2k

Probably the most common use for hashsets is to see whether they contain a certain element, which is close to an O(1) operation for them (assuming a sufficiently strong hashing function), as opposed to lists for which check for inclusion is O(n) (and sorted sets for which it is O(log n)). So if you do a lot of checks, whether an item is contained in some list, hahssets might be a performance improvement. If you only ever iterate over them, there won't be much difference (iterating over the whole set is O(n), same as with lists and hashsets have somewhat more overhead when adding items).

散列集最常见的用途可能是查看它们是否包含某个元素,这对它们来说接近于 O(1) 操作(假设散列函数足够强),而不是列表的包含检查是 O( n)(以及 O(log n) 的排序集)。因此,如果您进行大量检查,某个项目是否包含在某个列表中,hahssets 可能会提高性能。如果您只迭代它们,则不会有太大区别(迭代整个集合是 O(n),与列表和哈希集相同,在添加项时有更多的开销)。

And no, you can't index a set, which would not make sense anyway, because sets aren't ordered. If you add some items, the set won't remember which one was first, and which second etc.

不,你不能索引一个集合,这无论如何都没有意义,因为集合不是有序的。如果您添加一些项目,该集合将不会记住哪个是第一个,哪个是第二个等等。

回答by Addys

In short - anytime you are tempted to use a Dictionary (or a Dictionary where S is a property of T) then you should consider a HashSet (or HashSet + implementing IEquatable on T which equates on S)

简而言之 - 任何时候你想使用字典(或字典,其中 S 是 T 的一个属性),那么你应该考虑一个 HashSet(或 HashSet + 在 T 上实现 IEquatable,它等于 S)

回答by Sam Harwell

Here's a real example of where I use a HashSet<string>:

这是我使用 a 的真实示例HashSet<string>

Part of my syntax highlighter for UnrealScript files is a new feature that highlights Doxygen-style comments. I need to be able to tell if a @or \command is valid to determine whether to show it in gray (valid) or red (invalid). I have a HashSet<string>of all the valid commands, so whenever I hit a @xxxtoken in the lexer, I use validCommands.Contains(tokenText)as my O(1) validity check. I really don't care about anything except existenceof the command in the setof valid commands. Lets look at the alternatives I faced:

我的 UnrealScript 文件语法高亮器的一部分是一个高亮 Doxygen 样式注释的新功能。我需要能够判断 a @or\命令是否有效,以确定是以灰色(有效)还是红色(无效)显示它。我有一个HashSet<string>所有有效命令,所以每当我@xxx在词法分析器中遇到一个标记时,我都会使用validCommands.Contains(tokenText)我的 O(1) 有效性检查。除了有效命令集中该命令的存在之外,我真的不关心任何事情。让我们看看我面临的替代方案:

  • Dictionary<string, ?>: What type do I use for the value? The value is meaningless since I'm just going to use ContainsKey. Note: Before .NET 3.0 this was the only choice for O(1) lookups - HashSet<T>was added for 3.0 and extended to implement ISet<T>for 4.0.
  • List<string>: If I keep the list sorted, I can use BinarySearch, which is O(log n) (didn't see this fact mentioned above). However, since my list of valid commands is a fixed list that never changes, this will never be more appropriate than simply...
  • string[]: Again, Array.BinarySearchgives O(log n) performance. If the list is short, this could be the best performing option. It always has less space overhead than HashSet, Dictionary, or List. Even with BinarySearch, it's not faster for large sets, but for small sets it'd be worth experimenting. Mine has several hundred items though, so I passed on this.
  • Dictionary<string, ?>:我使用什么类型的值?该值毫无意义,因为我只是要使用ContainsKey. 注意:在 .NET 3.0 之前,这是 O(1) 查找的唯一选择 -HashSet<T>为 3.0 添加并扩展以实现ISet<T>4.0。
  • List<string>:如果我保持列表排序,我可以使用BinarySearchO(log n) (没有看到上面提到的这个事实)。但是,由于我的有效命令列表是一个永远不会更改的固定列表,因此这永远不会比简单地更合适......
  • string[]: 再次Array.BinarySearch提供 O(log n) 性能。如果列表很短,这可能是性能最好的选项。它的空间开销总是比HashSetDictionary、 或 少List。即使使用BinarySearch,对于大集合来说也不是更快,但是对于小集合来说,它值得尝试。我的有几百件物品,所以我传递了这个。

回答by Kenan E. K.

A HashSet<T>implements the ICollection<T>interface:

AHashSet<T>实现ICollection<T>接口:

public interface ICollection<T> : IEnumerable<T>, IEnumerable
{
    // Methods
    void Add(T item);
    void Clear();
    bool Contains(T item);
    void CopyTo(T[] array, int arrayIndex);
    bool Remove(T item);

    // Properties
   int Count { get; }
   bool IsReadOnly { get; }
}

A List<T>implements IList<T>, which extends the ICollection<T>

一个List<T>实现IList<T>,它扩展了ICollection<T>

public interface IList<T> : ICollection<T>
{
    // Methods
    int IndexOf(T item);
    void Insert(int index, T item);
    void RemoveAt(int index);

    // Properties
    T this[int index] { get; set; }
}

A HashSet has set semantics, implemented via a hashtable internally:

HashSet 具有设置语义,通过内部哈希表实现:

A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

集合是不包含重复元素且其元素没有特定顺序的集合。

What does the HashSet gain, if it loses index/position/list behavior?

如果 HashSet 失去索引/位置/列表行为,它会获得什么?

Adding and retrieving items from the HashSet is always by the object itself, not via an indexer, and close to an O(1) operation (List is O(1) add, O(1) retrieve by index, O(n) find/remove).

从 HashSet 添加和检索项目总是由对象本身,而不是通过索引器,并且接近 O(1) 操作(列表是 O(1) 添加,O(1) 通过索引检索,O(n) 查找) /消除)。

A HashSet's behavior could be compared to using a Dictionary<TKey,TValue>by only adding/removing keys as values, and ignoring dictionary values themselves. You would expect keys in a dictionary not to have duplicate values, and that's the point of the "Set" part.

可以将 HashSet 的行为与Dictionary<TKey,TValue>仅通过添加/删除键作为值并忽略字典值本身来使用 a 进行比较。您希望字典中的键没有重复值,这就是“设置”部分的重点。

回答by Thomas.Benz

HashSet would be used to remove duplicate elements in an IEnumerable collection. For example,

HashSet 将用于删除 IEnumerable 集合中的重复元素。例如,

List<string> duplicatedEnumrableStrings = new List<string> {"abc", "ghjr", "abc", "abc", "yre", "obm", "ghir", "qwrt", "abc", "vyeu"};
HashSet<string> uniqueStrings = new HashSet(duplicatedEnumrableStrings);

after those codes are run, uniqueStrings holds {"abc", "ghjr", "yre", "obm", "qwrt", "vyeu"};

这些代码运行后,uniqueStrings 持有 {"abc", "ghjr", "yre", "obm", "qwrt", "vyeu"};