C# 使用 Random 和 OrderBy 是一种好的洗牌算法吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1287567/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is using Random and OrderBy a good shuffle algorithm?
提问by Svish
I have read an articleabout various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:
我在Coding Horror 上读过一篇关于各种 shuffle 算法的文章。我看到有人在某个地方这样做来洗牌:
var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());
Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?
这是一个好的洗牌算法吗?它是如何工作的?这是一种可以接受的方式吗?
采纳答案by Jon Skeet
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
这不是我喜欢的一种洗牌方式,主要是因为它是 O(n log n) 没有充分的理由,当它很容易实现 O(n) 洗牌时。问题中的代码通过基本上为每个元素提供一个随机(希望是唯一的!)数字,然后根据该数字对元素进行排序来“工作”。
I prefer Durstenfield's variant of the Fisher-Yates shufflewhich swaps elements.
我更喜欢 Durstenfield 的Fisher-Yates shuffle变体,它可以交换元素。
Implementing a simple Shuffle
extension method would basically consist of calling ToList
or ToArray
on the input then using an existing implementation of Fisher-Yates. (Pass in the Random
as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.
实现一个简单的Shuffle
扩展方法基本上包括调用ToList
或ToArray
输入,然后使用 Fisher-Yates 的现有实现。(Random
作为参数传入以使生活更美好。)周围有很多实现......我可能在某个地方得到了一个答案。
The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.
这种扩展方法的好处在于,读者可以非常清楚您实际要做什么。
EDIT: Here's a simple implementation (no error checking!):
编辑:这是一个简单的实现(没有错误检查!):
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length-1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
T tmp = elements[i];
elements[i] = elements[swapIndex];
elements[swapIndex] = tmp;
}
// Lazily yield (avoiding aliasing issues etc)
foreach (T element in elements)
{
yield return element;
}
}
EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:
编辑:下面对性能的评论提醒我,我们实际上可以在洗牌时返回元素:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
? ? T[] elements = source.ToArray();
? ? for (int i = elements.Length - 1; i >= 0; i--)
? ? {
? ? ? ? // Swap element "i" with a random earlier element it (or itself)
// ... except we don't really need to swap it fully, as we can
// return it immediately, and afterwards it's irrelevant.
? ? ? ? int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
? ? ? ? elements[swapIndex] = elements[i];
? ? }
}
This will now only do as much work as it needs to.
这现在只会做它需要的工作。
Note that in both cases, you need to be careful about the instance of Random
you use as:
请注意,在这两种情况下,您都需要小心Random
使用的实例:
- Creating two instances of
Random
at roughly the same time will yield the same sequence of random numbers (when used in the same way) Random
isn't thread-safe.
Random
大致同时创建两个实例将产生相同的随机数序列(以相同方式使用时)Random
不是线程安全的。
I have an article on Random
which goes into more detail on these issues and provides solutions.
我有一篇文章Random
详细介绍了这些问题并提供了解决方案。
回答by ripper234
It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).
对于大多数用途来说可能没问题,并且几乎总是会生成真正的随机分布(除非 Random.Next() 生成两个相同的随机整数)。
It works by assigning each element of the series a random integer, then ordering the sequence by these integers.
它的工作原理是为系列的每个元素分配一个随机整数,然后按这些整数对序列进行排序。
It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.
对于 99.9% 的应用程序来说,这是完全可以接受的(除非您绝对需要处理上述边缘情况)。此外,飞碟对其运行时的反对是有效的,因此如果您正在调整一个很长的列表,您可能不想使用它。
回答by Dave Swersky
This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)
该算法通过为列表中的每个值生成一个新的随机值,然后按这些随机值对列表进行排序来进行混洗。将其视为向内存表中添加一个新列,然后用 GUID 填充它,然后按该列排序。对我来说似乎是一种有效的方法(尤其是使用 lambda 糖!)
回答by Samuel Carrijo
Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.
如果您不太担心性能,这似乎是一个很好的改组算法。我要指出的唯一问题是它的行为是不可控的,所以你可能很难测试它。
One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.
一种可能的选择是将种子作为参数传递给随机数生成器(或将随机生成器作为参数传递),这样您就可以更轻松地进行控制和测试。
回答by hughdbrown
This has come up many times before. Search for Fisher-Yates on StackOverflow.
这在以前已经出现过很多次了。在 StackOverflow 上搜索 Fisher-Yates。
Here is a C# code sampleI wrote for this algorithm. You can parameterize it on some other type, if you prefer.
这是我为此算法编写的C# 代码示例。如果您愿意,您可以将其参数化为其他类型。
static public class FisherYates
{
// Based on Java code from wikipedia:
// http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
static public void Shuffle(int[] deck)
{
Random r = new Random();
for (int n = deck.Length - 1; n > 0; --n)
{
int k = r.Next(n+1);
int temp = deck[n];
deck[n] = deck[k];
deck[k] = temp;
}
}
}
回答by Irfy
Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!
有点不相关,但这里有一个有趣的方法(尽管它真的很多余,但已经真正实现了)用于真正随机生成骰子!
The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).
我在这里发帖的原因是,他提出了一些有趣的观点,即他的用户对使用算法在实际骰子上洗牌的想法有何反应。当然,在现实世界中,这样的解决方案仅适用于随机性具有如此大影响并且可能影响金钱的真正极端情况;)。
回答by configurator
This is based on Jon Skeet's answer.
这是基于 Jon Skeet 的回答。
In that answer, the array is shuffled, then returned using yield
. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.
在那个答案中,数组被打乱,然后使用返回yield
。最终的结果是数组在 foreach 的持续时间内保留在内存中,以及迭代所需的对象,但成本都在开始 - 产量基本上是一个空循环。
This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield
the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray()
it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3)
and only three swaps will take place (although the entire array would have to be copied first).
这个算法在游戏中被大量使用,其中前三个项目被选中,其他的只有在以后才会需要。我的建议是yield
在数字交换后立即使用。这将降低启动成本,同时将迭代成本保持在 O(1)(基本上每次迭代 5 个操作)。总成本将保持不变,但洗牌本身会更快。在调用collection.Shuffle().ToArray()
它的情况下,理论上它没有区别,但在上述用例中,它会加快启动速度。此外,这将使该算法适用于您只需要几个唯一项目的情况。例如,如果您需要从一副 52 的牌中取出三张牌,您可以跟注,deck.Shuffle().Take(3)
并且只会进行三次交换(尽管必须先复制整个阵列)。
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
// we don't actually perform the swap, we can forget about the
// swapped element because we already returned it.
}
// there is one item remaining that was not returned - we return it now
yield return elements[0];
}
回答by Christian
I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!
我想说这里的许多答案,例如“此算法通过为列表中的每个值生成一个新的随机值来进行随机播放,然后按这些随机值对列表进行排序”可能是非常错误的!
I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!
我认为这不会为源集合的每个元素分配一个随机值。取而代之的是可能有一个排序算法像 Quicksort 一样运行,它会调用大约 n log n 次的比较函数。某种算法真的希望这个比较函数是稳定的并且总是返回相同的结果!
Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next()
for both parameters without caching these!
难道 IEnumerableSorter 会为例如快速排序的每个算法步骤调用一个比较函数,并且每次都x => r.Next()
为两个参数调用该函数而不缓存它们!
In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.
在这种情况下,你可能真的搞砸了排序算法,使它比算法建立的预期更糟糕。当然,它最终会变得稳定并返回一些东西。
I might check it later by putting debugging output inside a new "Next" function so see what happens. In Reflector I could not immediately find out how it works.
稍后我可能会通过将调试输出放在新的“Next”函数中来检查它,看看会发生什么。在 Reflector 中,我无法立即了解它是如何工作的。
回答by GMzo
Startup time to run on code with clear all threads and cache every new test,
在清除所有线程并缓存每个新测试的代码上运行的启动时间,
First unsuccessful code. It runs on LINQPad. If you follow to test this code.
第一个不成功的代码。它在 LINQPad 上运行。如果您按照此代码进行测试。
Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});
//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);
list.OrderBy(x => r.Next()) uses 38.6528 ms
list.OrderBy(x => r.Next()) 使用 38.6528 毫秒
list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)
list.OrderBy(x => Guid.NewGuid()) 使用 36.7634 毫秒(建议来自 MSDN。)
the after second time both of them use in the same time.
第二次后两者同时使用。
EDIT:TEST CODE on Intel Core i7 [email protected], Ram 8 GB DDR3 @1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]
编辑:英特尔酷睿 i7 [email protected]、Ram 8 GB DDR3 @1600、HDD SATA 5200 rpm 上的测试代码 [数据:www.dropbox.com/s/pbtmh5s9lw285kp/data]
using System;
using System.Runtime;
using System.Diagnostics;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Threading;
namespace Algorithm
{
class Program
{
public static void Main(string[] args)
{
try {
int i = 0;
int limit = 10;
var result = GetTestRandomSort(limit);
foreach (var element in result) {
Console.WriteLine();
Console.WriteLine("time {0}: {1} ms", ++i, element);
}
} catch (Exception e) {
Console.WriteLine(e.Message);
} finally {
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
public static IEnumerable<double> GetTestRandomSort(int limit)
{
for (int i = 0; i < 5; i++) {
string path = null, temp = null;
Stopwatch st = null;
StreamReader sr = null;
int? count = null;
List<string> list = null;
Random r = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Thread.Sleep(5000);
st = Stopwatch.StartNew();
#region Import Input Data
path = Environment.CurrentDirectory + "\data";
list = new List<string>();
sr = new StreamReader(path);
count = 0;
while (count < limit && (temp = sr.ReadLine()) != null) {
// Console.WriteLine(temp);
list.Add(temp);
count++;
}
sr.Close();
#endregion
// Console.WriteLine("--------------Random--------------");
// #region Sort by Random with OrderBy(random.Next())
// r = new Random();
// list = list.OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with OrderBy(Guid)
// list = list.OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with Parallel and OrderBy(random.Next())
// r = new Random();
// list = list.AsParallel().OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with Parallel OrderBy(Guid)
// list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with User-Defined Shuffle Method
// r = new Random();
// list = list.Shuffle(r).ToList();
// #endregion
// #region Sort by Random with Parallel User-Defined Shuffle Method
// r = new Random();
// list = list.AsParallel().Shuffle(r).ToList();
// #endregion
// Result
//
st.Stop();
yield return st.Elapsed.TotalMilliseconds;
foreach (var element in list) {
Console.WriteLine(element);
}
}
}
}
}
Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
结果描述:https: //www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
结果统计:https: //www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.
结论:
假设:LINQ OrderBy(r.Next()) 和 OrderBy(Guid.NewGuid()) 并不比 First Solution 中的 User-Defined Shuffle Method 差。
Answer: They are contradiction.
答:它们是矛盾的。
回答by xanatos
Starting from this quote of Skeet:
从 Skeet 的这句话开始:
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
这不是我喜欢的一种洗牌方式,主要是因为它是 O(n log n) 没有充分的理由,当它很容易实现 O(n) 洗牌时。问题中的代码通过基本上为每个元素提供一个随机(希望是唯一的!)数字,然后根据该数字对元素进行排序来“工作” 。
I'll go on a little explaining the reason for the hopefully unique!
我将继续解释希望独一无二的原因!
Now, from the Enumerable.OrderBy:
现在,从Enumerable.OrderBy:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved
此方法执行稳定排序;也就是说,如果两个元素的键相等,则保留元素的顺序
This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problemthat is exactly this problem.
这是非常重要的!如果两个元素“接收”相同的随机数会发生什么?碰巧它们保持在数组中的相同顺序。现在,这种情况发生的可能性有多大?很难准确计算,但生日问题正是这个问题。
Now, is it real? Is it true?
现在,这是真的吗?这是真的吗?
As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG
与往常一样,如有疑问,请编写一些程序行:http: //pastebin.com/5CDnUxPG
This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wikipage there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/and using the .OrderBy(x => r.Next())
and the .OrderBy(x => r.Next(someValue))
.
这个小代码块使用向后完成的 Fisher-Yates 算法和向前完成的 Fisher-Yates 算法将 3 个元素的数组混洗一定次数(在wiki页面中有两种伪代码算法......它们产生等效的结果,但一个是从第一个元素到最后一个元素完成的,而另一个是从最后一个元素到第一个元素完成的),http://blog.codinghorror.com/the-danger-of-naivete/的幼稚错误算法并使用.OrderBy(x => r.Next())
和.OrderBy(x => r.Next(someValue))
。
Now, Random.Nextis
现在,Random.Next是
A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.
大于或等于 0 且小于 MaxValue 的 32 位有符号整数。
so it's equivalent to
所以它相当于
OrderBy(x => r.Next(int.MaxValue))
To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue
isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy
, then any range of values should give the same result.
为了测试这个问题是否存在,我们可以扩大数组(非常慢的东西)或简单地减少随机数生成器的最大值(int.MaxValue
不是“特殊”数字......它只是一个非常大的数字)。最后,如果算法不受 稳定性的影响OrderBy
,那么任何范围的值都应该给出相同的结果。
The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024)
. If you make the array bigger (4 or 5), then even r.Next(1024)
isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)
然后程序测试一些值,范围为 1...4096。查看结果,很明显,对于低值(< 128),算法非常有偏差(4-8%)。使用 3 个值,您至少需要r.Next(1024)
. 如果您使数组更大(4 或 5),那么甚至r.Next(1024)
还不够。我不是改组和数学方面的专家,但我认为对于数组长度的每个额外位,您需要 2 个额外位的最大值(因为生日悖论与 sqrt(numvalues) 相关),所以如果最大值是 2^31,我会说你应该能够对最多 2^12/2^13 位(4096-8192 个元素)的数组进行排序