C# 我可以比较两个相同大小的位图以确定它们是否相同的最快方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2031217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the fastest way I can compare two equal-size bitmaps to determine whether they are identical?
提问by Erik Forbes
I am trying to write a function to determine whether two equal-size bitmaps are identical or not. The function I have right now simply compares a pixel at a time in each bitmap, returning false at the first non-equal pixel.
我正在尝试编写一个函数来确定两个相等大小的位图是否相同。我现在拥有的函数只是一次比较每个位图中的一个像素,在第一个不相等的像素处返回 false。
While this works, and works well for small bitmaps, in production I'm going to be using this in a tight loop and on larger images, so I need a better way. Does anyone have any recommendations?
虽然这很有效,并且适用于小位图,但在生产中,我将在紧密循环和更大的图像上使用它,所以我需要一个更好的方法。有人有什么建议吗?
The language I'm using is C# by the way - and yes, I am already using the .LockBits method. =)
顺便说一下,我使用的语言是 C# - 是的,我已经在使用 .LockBits 方法。=)
Edit: I've coded up implementations of some of the suggestions given, and here are the benchmarks. The setup: two identical (worst-case) bitmaps, 100x100 in size, with 10,000 iterations each. Here are the results:
编辑:我已经对给出的一些建议的实现进行了编码,这里是基准测试。设置:两个相同(最坏情况)位图,大小为 100x100,每个都有 10,000 次迭代。结果如下:
CompareByInts (Marc Gravell) : 1107ms
CompareByMD5 (Skilldrick) : 4222ms
CompareByMask (GrayWizardX) : 949ms
In CompareByInts and CompareByMask I'm using pointers to access the memory directly; in the MD5 method I'm using Marshal.Copy to retrieve a byte array and pass that as an argument to MD5.ComputeHash. CompareByMask is only slightly faster, but given the context I think any improvement is useful.
在 CompareByInts 和 CompareByMask 中,我使用指针直接访问内存;在 MD5 方法中,我使用 Marshal.Copy 检索字节数组并将其作为参数传递给 MD5.ComputeHash。CompareByMask 只是稍微快一点,但考虑到上下文,我认为任何改进都是有用的。
Thanks everyone. =)
谢谢大家。=)
Edit 2: Forgot to turn optimizations on - doing that gives GrayWizardX's answer even more of a boost:
编辑 2:忘记打开优化 - 这样做会给 GrayWizardX 的答案带来更大的提升:
CompareByInts (Marc Gravell) : 944ms
CompareByMD5 (Skilldrick) : 4275ms
CompareByMask (GrayWizardX) : 630ms
CompareByMemCmp (Erik) : 105ms
Interesting that the MD5 method didn't improve at all.
有趣的是,MD5 方法根本没有改进。
Edit 3: Posted my answer (MemCmp) which blew the other methods out of the water. o.O
编辑 3:发布了我的答案 (MemCmp),它使其他方法脱颖而出。oO
采纳答案by Erik Forbes
Edit 8-31-12: per Joey'scomment below, be mindful of the format of the bitmaps you compare. They may contain padding on the strides that render the bitmaps unequal, despite being equivalent pixel-wise. See this questionfor more details.
编辑 8-31-12:根据下面乔伊的评论,请注意您比较的位图的格式。它们可能包含使位图不相等的步幅上的填充,尽管在像素方面是等效的。有关更多详细信息,请参阅此问题。
Reading this answerto a question regarding comparing byte arrays has yielded a MUCH FASTER method: using P/Invoke and the memcmp API call in msvcrt. Here's the code:
阅读这个关于比较字节数组的问题的答案产生了一个更快的方法:在 msvcrt 中使用 P/Invoke 和 memcmp API 调用。这是代码:
[DllImport("msvcrt.dll")]
private static extern int memcmp(IntPtr b1, IntPtr b2, long count);
public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
{
if ((b1 == null) != (b2 == null)) return false;
if (b1.Size != b2.Size) return false;
var bd1 = b1.LockBits(new Rectangle(new Point(0, 0), b1.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
var bd2 = b2.LockBits(new Rectangle(new Point(0, 0), b2.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;
int stride = bd1.Stride;
int len = stride * b1.Height;
return memcmp(bd1scan0, bd2scan0, len) == 0;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}
回答by Marc Gravell
Well, you're using .LockBits
, so presumably you're using unsafe code. Rather than treating each row origin (Scan0 + y * Stride
) as a byte*
, consider treating it as an int*
; int
arithmetic is pretty quick, and you only have to do 1/4 as much work. And for images in ARGB you might still be talking in pixels, making the math simple.
好吧,您正在使用.LockBits
,所以大概您正在使用不安全的代码。与其将每一行原点 ( Scan0 + y * Stride
) 视为 a byte*
,不如考虑将其视为int*
; int
算术非常快,你只需要做 1/4 的工作。对于 ARGB 中的图像,您可能仍然以像素为单位进行讨论,这使数学变得简单。
回答by Skilldrick
Could you take a hash of each and compare? It would be slightly probabilistic, but practically not.
你能把每一个散列并比较吗?这会有点概率,但实际上不是。
Thanks to Ram, here's a sample implementationof this technique.
感谢 Ram,这是该技术的示例实现。
回答by GrayWizardx
If you are trying to determine if they are 100% equal, you can invert one and add it to the other if its zero they are identical. Extending this using unsafe code, take 64 bits at a time as a long and do the math that way, any differences can cause an immediate fail.
如果您试图确定它们是否 100% 相等,您可以反转一个并将其添加到另一个,如果零它们相同。使用不安全代码扩展它,一次取 64 位作为长度,并以这种方式进行数学计算,任何差异都可能导致立即失败。
If the images are not 100% identical (comparing png to jpeg), or if you are not looking for a 100% match then you have some more work ahead of you.
如果图像不是 100% 相同(比较 png 和 jpeg),或者如果您不是在寻找 100% 匹配,那么您还有更多工作要做。
Good luck.
祝你好运。
回答by rmeador
If you can implement something like Duff's Devicein your language, that might give you a significant speed boost over a simple loop. Usually it's used for copying data, but there's no reason it can't be used for comparing data instead.
如果您可以在您的语言中实现类似Duff's Device 的东西,那可能会在一个简单的循环中显着提高速度。通常它用于复制数据,但没有理由不能用于比较数据。
Or, for that matter, you may just want to use some equivalent to memcmp().
或者,就此而言,您可能只想使用一些与 memcmp() 等效的方法。
回答by Drew
You could try to add them to a database "blob" then use the database engine to compare their binaries. This would only give you a yes or no answer to whether the binary data is the same. It would be very easy to make 2 images that produce the same graphic but have different binary though.
您可以尝试将它们添加到数据库“blob”中,然后使用数据库引擎来比较它们的二进制文件。这只会为您提供关于二进制数据是否相同的“是”或“否”答案。制作 2 张产生相同图形但具有不同二进制的图像将非常容易。
You could also select a few random pixels and compare them, then if they are the same continue with more until you've checked all the pixels. This would only return a faster negative match though, it still would take as long to find 100% positive matches
您还可以选择一些随机像素并进行比较,如果它们相同,则继续进行更多,直到您检查完所有像素。这只会返回更快的否定匹配,但仍然需要很长时间才能找到 100% 的肯定匹配
回答by Jeff Kubina
If the original problem is just to find the exact duplicates among two bitmaps, then just a bit level comparison will have to do. I don't know C# but in C I would use the following function:
如果最初的问题只是在两个位图中找到精确的重复项,那么只需要进行位级比较即可。我不知道 C#,但在 CI 中会使用以下函数:
int areEqual (long size, long *a, long *b)
{
long start = size / 2;
long i;
for (i = start; i != size; i++) { if (a[i] != b[i]) return 0 }
for (i = 0; i != start; i++) { if (a[i] != b[i]) return 0 }
return 1;
}
I would start looking in the middle because I suspect there is a much better chance of finding unequal bits near the middle of the image than the beginning; of course, this would really depend on the images you are deduping, selecting a random place to start may be best.
我会从中间开始寻找,因为我怀疑在图像中间找到不等位的机会比开始时要大得多;当然,这实际上取决于您要进行重复数据删除的图像,选择一个随机位置开始可能是最好的。
If you are trying to find the exact duplicates among hundreds of images then comparing all pairs of them is unnecessary. First compute the MD5 hash of each image and place it in a list of pairs (md5Hash, imageId); then sort the list by the m5Hash. Next, only do pairwise comparisons on the images that have the same md5Hash.
如果您试图在数百张图像中找到完全相同的副本,那么比较所有对的图像就没有必要了。首先计算每个图像的 MD5 哈希值并将其放入对 (md5Hash, imageId) 的列表中;然后按 m5Hash 对列表进行排序。接下来,只对具有相同 md5Hash 的图像进行成对比较。
回答by rampion
If these bitmaps are already on your graphics card then you can parallelize such a check by doing it on the graphics card using a language like CUDAor OpenCL.
如果这些位图已经在您的显卡上,那么您可以通过使用CUDA或OpenCL 之类的语言在显卡上进行并行检查。
I'll explain in terms of CUDA, since that's the one I know. Basically CUDA lets you write general purpose code to run in parallel across each node of your graphics card. You can access bitmaps that are in shared memory. Each invocation of the function is also given an index within the set of parallel runs. So, for a problem like this, you'd just run one of the above comparison functions for some subset of the bitmap - using parallelization to cover the entire bitmap. Then, just write a 1 to a certain memory location if the comparison fails (and write nothing if it succeeds).
我会用 CUDA 来解释,因为那是我所知道的。基本上,CUDA 允许您编写通用代码以在显卡的每个节点上并行运行。您可以访问共享内存中的位图。函数的每次调用都会在并行运行集中给出一个索引。因此,对于这样的问题,您只需为位图的某个子集运行上述比较函数之一 - 使用并行化覆盖整个位图。然后,如果比较失败,只需将 1 写入某个内存位置(如果成功则不写入任何内容)。
If you don't already have the bitmaps on your graphics card, this probably isn't the way to go, since the costs for loading the two bitmaps on your card will easily eclipse the savings such parallelization will gain you.
如果您的图形卡上还没有位图,这可能不是可行的方法,因为在您的卡上加载两个位图的成本很容易超过这种并行化为您带来的节省。
Here's some (pretty bad) example code (it's been a little while since I programmed CUDA). There's better ways to access bitmaps that are already loaded as textures, but I didn't bother here.
这是一些(非常糟糕的)示例代码(自从我编写 CUDA 以来已经有一段时间了)。有更好的方法来访问已经作为纹理加载的位图,但我没有在这里打扰。
// kernel to run on GPU, once per thread
__global__ void compare_bitmaps(long const * const A, long const * const B, char * const retValue, size_t const len)
{
// divide the work equally among the threads (each thread is in a block, each block is in a grid)
size_t const threads_per_block = blockDim.x * blockDim.y * blockDim.z;
size_t const len_to_compare = len / (gridDim.x * gridDim.y * gridDim.z * threads_per_block);
# define offset3(idx3,dim3) (idx3.x + dim3.x * (idx3.y + dim3.y * idx3.z))
size_t const start_offset = len_to_compare * (offset3(threadIdx,blockDim) + threads_per_block * offset3(blockIdx,gridDim));
size_t const stop_offset = start_offset + len_to_compare;
# undef offset3
size_t i;
for (i = start_offset; i < stop_offset; i++)
{
if (A[i] != B[i])
{
*retValue = 1;
break;
}
}
return;
}
回答by nathanchere
Based on the approach of comparing hashes instead of comparing every single pixel, this is what I use:
基于比较哈希而不是比较每个像素的方法,这就是我使用的:
public static class Utils
{
public static byte[] ShaHash(this Image image)
{
var bytes = new byte[1];
bytes = (byte[])(new ImageConverter()).ConvertTo(image, bytes.GetType());
return (new SHA256Managed()).ComputeHash(bytes);
}
public static bool AreEqual(Image imageA, Image imageB)
{
if (imageA.Width != imageB.Width) return false;
if (imageA.Height != imageB.Height) return false;
var hashA = imageA.ShaHash();
var hashB = imageB.ShaHash();
return !hashA
.Where((nextByte, index) => nextByte != hashB[index])
.Any();
}
]
Usage is straight forward:
用法很简单:
bool isMatch = Utils.AreEqual(bitmapOne, bitmapTwo);