C# 何时在 SortedDictionary<TKey, TValue> 上使用 SortedList<TKey, TValue>?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1376965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 15:50:09  来源:igfitidea点击:

When to use a SortedList<TKey, TValue> over a SortedDictionary<TKey, TValue>?

c#.netsortedlistsorteddictionary

提问by Scott Dorman

This may appear to be a duplicate of this question, which asks "What's the difference between SortedListand SortedDictionary?" Unfortunately, the answers do nothing more than quote the MSDN documentation (which clearly states that there are performance and memory use differences between the two) but don't actually answer the question.

这可能看起来是这个问题的重复,它询问“ SortedListSortedDictionary之间有什么区别?” 不幸的是,这些答案只不过是引用了 MSDN 文档(其中明确指出两者之间存在性能和内存使用差异),但实际上并未回答问题。

In fact (and so this question doesn't get the same answers), according to MSDN:

事实上(因此这个问题没有得到相同的答案),根据 MSDN:

The SortedList<TKey, TValue>generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary<TKey, TValue>generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

  • SortedList<TKey, TValue>uses less memory than SortedDictionary<TKey, TValue>.

  • SortedDictionary<TKey, TValue>has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList<TKey, TValue>.

  • If the list is populated all at once from sorted data, SortedList<TKey, TValue>is faster than SortedDictionary<TKey, TValue>.

SortedList<TKey, TValue>通用类是O(log n)的检索,其中n是字典中的元件的数目的二进制搜索树。在这方面,它类似于 SortedDictionary<TKey, TValue>泛型类。这两个类具有相似的对象模型,并且都有 O(log n) 检索。这两个类的不同之处在于内存使用以及插入和删除的速度:

  • SortedList<TKey, TValue>使用的内存少于SortedDictionary<TKey, TValue>.

  • SortedDictionary<TKey, TValue>对未排序的数据有更快的插入和删除操作,O(log n) 而不是 O(n) SortedList<TKey, TValue>

  • 如果列表从排序的数据中一次性全部填充,SortedList<TKey, TValue>则比 SortedDictionary<TKey, TValue>.

So, clearly this would indicated that SortedList<TKey, TValue>is the better choice unlessyou need faster insert and remove operations for unsorteddata.

因此,显然这表明这SortedList<TKey, TValue>是更好的选择,除非您需要对未排序数据进行更快的插入和删除操作。

The question still remains, given the information above what are the practical (real-world, business case, etc.) reasons for using a SortedDictionary<TKey, TValue>? Based on the performance information, it would imply that there really is no need to have SortedDictionary<TKey, TValue>at all.

问题仍然存在,鉴于上述信息,使用SortedDictionary<TKey, TValue>? 根据性能信息,这意味着根本不需要SortedDictionary<TKey, TValue>

回答by David Rutten

That's all there is to it. Retrieval of keys is comparable, but addition is much faster with Dictionaries.

这里的所有都是它的。键的检索具有可比性,但使用 Dictionaries 进行加法要快得多。

I try to use SortedList as much as possible because it allows me to iterate over the keys and value collections. This is not possible with SortedDictionary as far as I know.

我尝试尽可能多地使用 SortedList,因为它允许我迭代键和值集合。据我所知,这在 SortedDictionary 中是不可能的。

I'm not sure about this, but as far as I know Dictionaries store data in Tree structures, whereas List store data in linear arrays. That explains why insertion and removal is much faster with dictionaries, since less memory has to be shifted around. It also explains why you can iterate over SortedLists but not SortedDictionary.

我对此不确定,但据我所知,字典将数据存储在树结构中,而列表将数据存储在线性数组中。这解释了为什么使用字典插入和删除要快得多,因为需要移动的内存更少。它还解释了为什么您可以迭代 SortedLists 而不是 SortedDictionary。

回答by Ash

I'm not sure how accurate the MSDN documentation is on SortedListand SortedDictionary. It seems to be saying both are implemented using a binary search tree. But if the SortedList uses a binary search tree, why would it be much slower on additions than SortedDictionary?

我不确定 MSDN 文档在SortedList和上的准确程度SortedDictionary。似乎是说两者都是使用二叉搜索树实现的。但是如果 SortedList 使用二叉搜索树,为什么它在添加时会比 慢得多SortedDictionary

Anyway, here are some performance test results.

无论如何,这里有一些性能测试结果。

Each test operates on a SortedList/ SortedDictionarycontaining 10,000 int32 keys. Each test is repeated 1,000 times (Release build, Start without Debugging).

每个测试都在一个包含 10,000 个 int32 键的SortedList/上运行SortedDictionary。每个测试重复 1,000 次(发布版本,不调试开始)。

The first group of tests add keys in sequence from 0 to 9,999. The second group of tests add random shuffled keys between 0 to 9,999 (every number is added exactly once).

第一组测试按从 0 到 9,999 的顺序添加密钥。第二组测试添加 0 到 9,999 之间的随机混洗密钥(每个数字只添加一次)。

***** Tests.PerformanceTests.SortedTest

SortedDictionary Add sorted: 4411 ms
SortedDictionary Get sorted: 2374 ms


SortedList Add sorted: 1422 ms
SortedList Get sorted: 1843 ms

***** Tests.PerformanceTests.UnsortedTest

SortedDictionary Add unsorted: 4640 ms
SortedDictionary Get unsorted: 2903 ms


SortedList Add unsorted: 36559 ms
SortedList Get unsorted: 2243 ms

As with any profiling, the important thing is the relative performance, not the actual numbers.

与任何分析一样,重要的是相对性能,而不是实际数字。

As you can see, on sorted data the sorted list is faster than the SortedDictionary. On unsorted data the SortedListis slightly quicker on retrieval, but about 9 times slower on adding.

如您所见,在排序数据上,排序列表比SortedDictionary. 在未排序的数据SortedList上,检索速度稍快,但添加速度慢约 9 倍。

If both are using binary trees internally, it is quite surprising that the Add operation on unsorted data is so much slower for SortedList. It is possible that sorted list may also be adding items to a sorted linear data structure at the same time, which would slow it down.

如果两者都在内部使用二叉树,那么对未排序数据的 Add 操作对于SortedList. 排序列表也可能同时将项目添加到排序的线性数据结构,这会减慢它的速度。

However, you would expect the memory usage of a SortedListto be equal or greater than or at least equal to a SortedDictionary. But this contradicts what the MSDN documentation says.

但是,您希望 a 的内存使用量SortedList等于或大于或至少等于 a SortedDictionary。但这与 MSDN 文档所说的相矛盾。

回答by tigrou

I don't know why MSDN says that SortedList<TKey, TValue>use a binary tree for its implementation because if you look at code with a decompiler like Reflectoryou realize its not true.

我不知道为什么 MSDN 说SortedList<TKey, TValue>使用二叉树来实现它,因为如果你用反编译器看代码,Reflector你就会意识到它不是真的。

SortedList<TKey, TValue>is simply an array that grows over the time.

SortedList<TKey, TValue>只是一个随时间增长的数组。

Every time you insert an element, it first check if the array has enough capacity, if not, a bigger array is recreated and old elements are copied into it (like List<T>)

每次插入元素时,它首先检查数组是否有足够的容量,如果没有,则重新创建一个更大的数组并将旧元素复制到其中(例如List<T>

After that, it searches whereto insert the element, using a binary search (this is possible since the array is indexable and already sorted).

在此之后,它搜索其中插入元件,使用二进制搜索(因为阵列是可转位的,并已经被排序,这是可能的)。

To keep the array sorted, it moves (or pushes) all the elements situated after position of element to be inserted by one position(using Array.Copy()).

为了保持数组排序,它移动(或推动)位于要插入的元素位置之后的所有元素(使用Array.Copy())。

Eg :

例如:

// we want to insert "3" 

2  
4  <= 3
5
8
9
.      
.      
.  

// we have to move some elements first

2
.  <= 3
4 
5  |
8  v
9
.
.

That explains why performance of SortedListis so bad when you insert unsorted elements. It has to re-copy some elements almost every insertion. The only case it has not to be done is when the element has to be inserted at the end of the array.

这就解释了为什么SortedList插入未排序元素时的性能如此糟糕。它几乎每次插入都必须重新复制一些元素。唯一不需要这样做的情况是必须将元素插入到数组的末尾。

SortedDictionary<TKey, TValue>is different and use a binary tree to insert and retrieve elements. It also has some cost at insert because sometimes the tree need to be re-balanced (but not every insertion).

SortedDictionary<TKey, TValue>不同的是,使用二叉树来插入和检索元素。它在插入时也有一些成本,因为有时树需要重新平衡(但不是每次插入)。

Performance is quite similar while searching an element with SortedListor SortedDictionarybecause they both use a binary search.

使用SortedList或搜索元素时的性能非常相似,SortedDictionary因为它们都使用二分搜索。



In my opinion, you should neveruse SortedListto just sort an array. Unless you have very few elements, it will always be faster to insert values into a list (or array) and then call Sort()method.

在我看来,你应该永远使用SortedList,只是排序的数组。除非您的元素很少,否则将值插入列表(或数组)然后调用Sort()方法总是更快。

SortedListis mostly useful when you have a list of values already sorted (eg: from database), you want to keep it sorted and perform some operations that would take advantage it is sorted (eg: Contains()method of SortedListperforms a binary search instead of linear search)

SortedList当您有一个已经排序的值列表(例如:来自数据库),您希望保持排序并执行一些可以利用它已排序的操作(例如:执行二进制搜索而不是线性搜索的Contains()方法)时,这是最有用的SortedList

SortedDictionaryoffers same advantages than SortedListbut performs better if values to insert are not already sorted.

SortedDictionarySortedList如果要插入的值尚未排序,则具有相同的优点,但性能更好。



EDIT : If you are using .NET Framework 4.5, an alternative to SortedDictionary<TKey, TValue>is SortedSet<T>. It works the same way as SortedDictionary, using a binary tree, but keys and values are the same here.

编辑:如果您使用的是 .NET Framework 4.5,则替代方法SortedDictionary<TKey, TValue>SortedSet<T>. 它的工作方式与 相同SortedDictionary,使用二叉树,但这里的键和值是相同的。

回答by nawfal

Are they meant for two different purposes?

它们是用于两个不同的目的吗?

There is not much semantic difference these two collection types in .NET make. They both offer keyed lookup as well as keep the entries in sort order of keys. In most cases you will be ok with either of them. Perhaps the only differentiator would be the indexed retrieval SortedListpermits.

.NET make 中这两种集合类型在语义上没有太大区别。它们都提供键控查找以及按键的排序顺序保存条目。在大多数情况下,您可以接受其中任何一个。也许唯一的区别是索引检索SortedList许可。

But performance?

但是性能呢?

However there is a performance difference which mightbe a stronger factor to choose between them. Here is a tabular view of their asymptotic complexity.

但是,存在性能差异,这可能是在它们之间进行选择的更重要因素。这是它们渐近复杂性的表格视图。

+------------------+---------+----------+--------+----------+----------+---------+
| Collection       | Indexed | Keyed    | Value  | Addition |  Removal | Memory  |
|                  | lookup  | lookup   | lookup |          |          |         |
+------------------+---------+----------+--------+----------+----------+---------+
| SortedList       | O(1)    | O(log n) | O(n)   | O(n)*    | O(n)     | Lesser  |
| SortedDictionary | n/a     | O(log n) | O(n)   | O(log n) | O(log n) | Greater |
+------------------+---------+----------+--------+----------+----------+---------+

* Insertion is O(1) for data that are already in sort order, so that each 
  element is added to the end of the list (assuming no resize is required).

Summary

概括

To roughly summarize, you want a SortedList<K, V>when:

粗略地总结一下,您需要一个SortedList<K, V>时间:

  1. you require indexed look-up.
  2. it's desirable to have lesser memory overhead.
  3. your input data is already sorted (say you get it already ordered from db).
  1. 您需要索引查找。
  2. 需要较少的内存开销。
  3. 您的输入数据已经排序(假设您已经从数据库中订购了它)。

You would instead want to prefer a SortedDictionary<K, V>when:

你会更喜欢一个SortedDictionary<K, V>时间:

  1. relative overallperformance matters (with respect to scaling).
  2. your input data is unordered.
  1. 相对整体性能很重要(关于缩放)。
  2. 您的输入数据是无序的。

Writing code

写代码

Both SortedList<K, V>and SortedDictionary<K, V>implement IDictionary<K, V>, so in your code you can return IDictionary<K, V>from the method or declare variable as IDictionary<K, V>. Basically hide the implementation detail, and code against interface.

无论SortedList<K, V>SortedDictionary<K, V>实施IDictionary<K, V>,所以在你的代码,你可以返回IDictionary<K, V>从作为该方法或声明变量IDictionary<K, V>。基本上隐藏了实现细节,以及针对接口的代码。

IDictionary<K, V> x = new SortedDictionary<K, V>(); //for eg. 

In future, its easier to switch from either in case you're not happy with performance characteristic of one collection.

将来,如果您对某个系列的性能特征不满意,则可以更轻松地从任一版本切换。



For more info on the two collection types see the original questionlinked.

有关这两种集合类型的更多信息,请参阅链接的原始问题

回答by Lev

Visual representation of performance differences.

性能差异的可视化表示。

enter image description here

在此处输入图片说明

回答by user3290232

An important consideration for us is the fact that we often have small dictionaries (<100 elements), and current processessors much faster at accessing sequential memory while performing few difficult to predict branches. (i.e. iterating over a linear array rather than traversing a tree) So when you have less than about 60 elements in your dictionary, SortedList<> is often the fastest and most memory efficient dictionary in many use cases.

对我们来说一个重要的考虑因素是我们经常有小字典(<100 个元素),并且当前进程在访问顺序内存时要快得多,同时执行一些难以预测的分支。(即迭代线性数组而不是遍历树)因此,当您的字典中的元素少于 60 个时,SortedList<> 在许多用例中通常是最快且内存效率最高的字典。