C# File.Copy 与手动 FileStream.Write 用于复制文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1246899/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
File.Copy vs. Manual FileStream.Write For Copying File
提问by jakejgordon
My problem is in regards file copying performance. We have a media management system that requires a lot of moving files around on the file system to different locations including windows shares on the same network, FTP sites, AmazonS3, etc. When we were all on one windows network we could get away with using System.IO.File.Copy(source, destination) to copy a file. Since many times all we have is an input Stream (like a MemoryStream), we tried abstracting the Copy operation to take an input Stream and an output Stream but we are seeing a massive performance decrease. Below is some code for copying a file to use as a discussion point.
我的问题是关于文件复制性能。我们有一个媒体管理系统,需要将文件系统上的大量文件移动到不同的位置,包括同一网络上的 Windows 共享、FTP 站点、AmazonS3 等。当我们都在一个 Windows 网络上时,我们可以摆脱使用System.IO.File.Copy(source, destination) 复制文件。由于很多时候我们只有一个输入流(如 MemoryStream),我们尝试抽象 Copy 操作以获取输入流和输出流,但我们看到性能大幅下降。下面是一些用于复制文件以用作讨论点的代码。
public void Copy(System.IO.Stream inStream, string outputFilePath)
{
int bufferSize = 1024 * 64;
using (FileStream fileStream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
{
int bytesRead = -1;
byte[] bytes = new byte[bufferSize];
while ((bytesRead = inStream.Read(bytes, 0, bufferSize)) > 0)
{
fileStream.Write(bytes, 0, bytesRead);
fileStream.Flush();
}
}
}
Does anyone know why this performs so much slower than File.Copy? Is there anything I can do to improve performance? Am I just going to have to put special logic in to see if I'm copying from one windows location to another--in which case I would just use File.Copy and in the other cases I'll use the streams?
有谁知道为什么这比 File.Copy 执行得慢得多?我可以做些什么来提高性能?我是否只需要放入特殊的逻辑来查看我是否要从一个 Windows 位置复制到另一个位置——在这种情况下我将只使用 File.Copy 而在其他情况下我将使用流?
Please let me know what you think and whether you need additional information. I have tried different buffer sizes and it seems like a 64k buffer size is optimal for our "small" files and 256k+ is a better buffer size for our "large" files--but in either case it performs much worse than File.Copy(). Thanks in advance!
请告诉我您的想法以及您是否需要其他信息。我尝试了不同的缓冲区大小,似乎 64k 缓冲区大小最适合我们的“小”文件,而 256k+ 是我们“大”文件更好的缓冲区大小——但在任何一种情况下,它的性能都比 File.Copy( )。提前致谢!
采纳答案by arbiter
File.Copy was build around CopyFileWin32 function and this function takes lot of attention from MS crew (remember this Vista-related threads about slow copy performance).
File.Copy 是围绕CopyFileWin32 函数构建的,这个函数引起了 MS 工作人员的大量关注(请记住这个与 Vista 相关的关于慢速复制性能的线程)。
Several clues to improve performance of your method:
提高方法性能的几个线索:
- Like many said earlier remove Flush method from your cycle. You do not need it at all.
- Increasing buffer may help, but only on file-to-file operations, for network shares, or ftp servers this will slow down instead. 60 * 1024 is ideal for network shares, at least before vista. for ftp 32k will be enough in most cases.
- Help os by providing your caching strategy (in your case sequential reading and writing), use FileStream constructor override with FileOptionsparameter (SequentalScan).
- You can speed up copying by using asynchronous pattern (especially useful for network-to-file cases), but do not use threads for this, instead use overlapped io (BeginRead, EndRead, BeginWrite, EndWrite in .net), and do not forget set Asynchronous option in FileStream constructor (see FileOptions)
- 就像许多人之前所说的那样,从您的循环中删除 Flush 方法。你根本不需要它。
- 增加缓冲区可能会有所帮助,但仅适用于文件到文件操作、网络共享或 ftp 服务器,这反而会变慢。60 * 1024 是网络共享的理想选择,至少在 vista 之前是这样。在大多数情况下,ftp 32k 就足够了。
- 通过提供您的缓存策略(在您的情况下顺序读取和写入)来帮助操作系统,使用带有FileOptions参数(SequentalScan)的FileStream 构造函数覆盖。
- 您可以通过使用异步模式来加速复制(对于网络到文件的情况尤其有用),但不要为此使用线程,而是使用重叠的 io(.net 中的 BeginRead、EndRead、BeginWrite、EndWrite),并且不要忘记在 FileStream 构造函数中设置异步选项(请参阅FileOptions)
Example of asynchronous copy pattern:
异步复制模式示例:
int Readed = 0;
IAsyncResult ReadResult;
IAsyncResult WriteResult;
ReadResult = sourceStream.BeginRead(ActiveBuffer, 0, ActiveBuffer.Length, null, null);
do
{
Readed = sourceStream.EndRead(ReadResult);
WriteResult = destStream.BeginWrite(ActiveBuffer, 0, Readed, null, null);
WriteBuffer = ActiveBuffer;
if (Readed > 0)
{
ReadResult = sourceStream.BeginRead(BackBuffer, 0, BackBuffer.Length, null, null);
BackBuffer = Interlocked.Exchange(ref ActiveBuffer, BackBuffer);
}
destStream.EndWrite(WriteResult);
}
while (Readed > 0);
回答by Eric J.
One thing that stands out is that you are reading a chunk, writing that chunk, reading another chunk and so on.
突出的一件事是您正在读取一个块,写入该块,读取另一个块等等。
Streaming operations are great candidates for multithreading. My guess is that File.Copy implements multithreading.
流操作是多线程的理想选择。我的猜测是 File.Copy 实现了多线程。
Try reading in one thread and writing in another thread. You will need to coordinate the threads so that the write thread doesn't start writing away a buffer until the read thread is done filling it up. You can solve this by having two buffers, one that is being read while the other is being written, and a flag that says which buffer is currently being used for which purpose.
尝试在一个线程中读取并在另一个线程中写入。您将需要协调线程,以便写入线程在读取线程完成填充之前不会开始写入缓冲区。您可以通过使用两个缓冲区来解决这个问题,一个正在读取而另一个正在写入,以及一个表示当前正在使用哪个缓冲区的标志。
回答by Aviad Ben Dov
Try to remove the Flush call, and move it to be outside the loop.
尝试删除 Flush 调用,并将其移到循环之外。
Sometimes the OS knows best when to flush the IO.. It allows it to better use its internal buffers.
有时操作系统最清楚何时刷新 IO。它允许它更好地使用其内部缓冲区。
回答by sylvanaar
Here's a similar answer
这是一个类似的答案
How do I copy the contents of one stream to another?
Your main problem is the call to Flush(), that will bind your performance to the speed of the I/O.
您的主要问题是对 Flush() 的调用,这会将您的性能与 I/O 的速度联系起来。
回答by Rob Levine
Three changes will dramatically improve performance:
三个变化将显着提高性能:
- Increase your buffer size, try 1MB (well -just experiment)
- After you open your fileStream, call fileStream.SetLength(inStream.Length) to allocate the entire block on disk up front (only works if inStream is seekable)
- Remove fileStream.Flush() - it is redundant and probably has the single biggest impact on performance as it will block until the flush is complete. The stream will be flushed anyway on dispose.
- 增加你的缓冲区大小,尝试 1MB(好吧 - 只是实验)
- 打开 fileStream 后,调用 fileStream.SetLength(inStream.Length) 预先在磁盘上分配整个块(仅在 inStream 可查找时才有效)
- 删除 fileStream.Flush() - 它是多余的,可能对性能的影响最大,因为它会阻塞直到刷新完成。无论如何都会在处置时刷新流。
This seemed about 3-4 times faster in the experiments I tried:
在我尝试的实验中,这似乎快了 3-4 倍:
public static void Copy(System.IO.Stream inStream, string outputFilePath)
{
int bufferSize = 1024 * 1024;
using (FileStream fileStream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
{
fileStream.SetLength(inStream.Length);
int bytesRead = -1;
byte[] bytes = new byte[bufferSize];
while ((bytesRead = inStream.Read(bytes, 0, bufferSize)) > 0)
{
fileStream.Write(bytes, 0, bytesRead);
}
}
}
回答by lavinio
Mark Russinovichwould be the authority on this.
Mark Russinovich将是这方面的权威。
He wrote on his blogan entry Inside Vista SP1 File Copy Improvementswhich sums up the Windows state of the art through Vista SP1.
他在他的写博客的条目里面Vista SP1的文件复制改进它概括了技术人员通过Vista SP1的的Windows的状态。
My semi-educated guess would be that File.Copy would be most robust over the greatest number of situations. Of course, that doesn't mean in some specific corner case, your own code might beat it...
我的半学历猜测是 File.Copy 在最多的情况下是最健壮的。当然,这并不意味着在某些特定的极端情况下,您自己的代码可能会击败它......
回答by Ed S.
Dusting off reflector we can see that File.Copy actually calls the Win32 API:
清理反射器,我们可以看到 File.Copy 实际上调用了 Win32 API:
if (!Win32Native.CopyFile(fullPathInternal, dst, !overwrite))
Which resolves to
哪个解决了
[DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
internal static extern bool CopyFile(string src, string dst, bool failIfExists);
回答by AnthonyWJones
You'll never going to able to beat the operating system at doing something so fundemental with your own code, not even if you crafted it carefully in assembler.
即使您在汇编程序中精心制作,您也永远无法在使用自己的代码执行如此基础的事情时击败操作系统。
If you need make sure that your operations occur with the best performance AND you want to mix and match various sources then you will need to create a type that describes the resource locations. You then create an API that has functions such as Copy
that takes two such types and having examined the descriptions of both chooses the best performing copy mechanism. E.g., having determined that both locations are windows file locations you it would choose File.Copy OR if the source is windows file but the destination is to be HTTP POST it uses a WebRequest.
如果您需要确保您的操作以最佳性能发生并且您想要混合和匹配各种来源,那么您将需要创建一个描述资源位置的类型。然后,您创建一个 API,该 API 具有诸如Copy
采用两种此类类型的函数,并在检查了两者的描述后选择了性能最佳的复制机制。例如,确定两个位置都是 Windows 文件位置后,您将选择 File.Copy 或如果源是 Windows 文件但目标是 HTTP POST,则它使用 WebRequest。