C# 序列化/反序列化的 XML 与二进制性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1092020/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
XML vs Binary performance for Serialization/Deserialization
提问by Charlie
I'm working on a compact framework application and need to boost performance. The app currently works offline by serializing objects to XML and storing them in a database. Using a profiling tool I could see this was quite a big overhead, slowing the app. I thought if I switched to a binary serialization the performance would increase, but because this is not supported in the compact framework I looked at protobuf-net. The serialization seems quicker, but deserialization much slower and the app is doing more deserializing than serializing.
我正在开发一个紧凑的框架应用程序,需要提高性能。该应用程序目前通过将对象序列化为 XML 并将它们存储在数据库中来离线工作。使用分析工具,我可以看到这是一个相当大的开销,减慢了应用程序。我认为如果我切换到二进制序列化,性能会提高,但是因为紧凑型框架不支持这点,所以我查看了 protobuf-net。序列化似乎更快,但反序列化要慢得多,而且应用程序的反序列化比序列化要多。
Should binary serialization should be faster and if so what I can do to speed up the performance? Here's a snippet of how I'm using both XML and binary:
二进制序列化是否应该更快,如果是这样,我可以做些什么来加快性能?这是我如何使用 XML 和二进制文件的片段:
XML serialization:
XML序列化:
public string Serialize(T obj)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
serializer.Serialize(stream, obj);
stream = (MemoryStream)writer.BaseStream;
return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
UTF8Encoding encoding = new UTF8Encoding();
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));
return (T)serializer.Deserialize(stream);
}
Protobuf-net Binary serialization:
Protobuf-net 二进制序列化:
public byte[] Serialize(T obj)
{
byte[] raw;
using (MemoryStream memoryStream = new MemoryStream())
{
Serializer.Serialize(memoryStream, obj);
raw = memoryStream.ToArray();
}
return raw;
}
public T Deserialize(byte[] serializedType)
{
T obj;
using (MemoryStream memoryStream = new MemoryStream(serializedType))
{
obj = Serializer.Deserialize<T>(memoryStream);
}
return obj;
}
回答by Marc Gravell
Interesting... thoughts:
有趣的......想法:
- what version of CF is this; 2.0? 3.5? In particular, CF 3.5 has
Delegate.CreateDelegate
that allows protobuf-net to access properties much faster than in can in CF 2.0 - are you annotating fieldsor properties? Again, in CF the reflection optimisations are limited; you can get beter performance in CF 3.5 with properties, as with a field the only option I have available is
FieldInfo.SetValue
- 这是什么版本的CF;2.0?3.5?特别是,CF 3.5
Delegate.CreateDelegate
允许 protobuf-net 比 CF 2.0 更快地访问属性 - 您是在注释字段还是属性?同样,在 CF 中,反射优化是有限的;您可以通过属性在 CF 3.5 中获得更好的性能,因为对于字段,我唯一可用的选项是
FieldInfo.SetValue
There are a number of other things that simply don't exist in CF, so it has to make compromises in a few places. For overly complex models there is also a known issue with the generics limitations of CF. A fix is underway, but it is a bigchange, and is taking "a while".
还有许多其他东西在 CF 中根本不存在,因此它必须在一些地方做出妥协。对于过于复杂的模型,CF 的泛型限制也存在一个已知问题。修复正在进行中,但这是一个很大的变化,需要“一段时间”。
For info, some metrics on regular (full) .NET comparing various formats (including XmlSerializer
and protobuf-net) are here.
有关信息,这里有一些关于常规(完整).NET 比较各种格式(包括XmlSerializer
和 protobuf-net)的指标。
回答by Tundey
Have you tried creating custom serialization classes for your classes? Instead of using XmlSerializer which is a general purpose serializer (it creates a bunch of classes at runtime). There's a tool for doing this (sgen). You run it during your build process and it generates a custom assembly that can be used in pace of XmlSerializer.
您是否尝试过为您的类创建自定义序列化类?而不是使用 XmlSerializer 这是一个通用的序列化器(它在运行时创建了一堆类)。有一个工具可以做到这一点(sgen)。您在构建过程中运行它,它会生成一个自定义程序集,可以按照 XmlSerializer 的速度使用。
If you have Visual Studio, the option is available under the Build tab of your project's properties.
如果您有 Visual Studio,则该选项位于项目属性的“生成”选项卡下。
回答by kyoryu
Is the performance hit in serializing the objects, or writing them to the database? Since writing them is likely hitting some kind of slow storage, I'd imagine it to be a much bigger perf hit than the serialization step.
序列化对象或将它们写入数据库会影响性能吗?由于编写它们可能会遇到某种缓慢的存储,因此我认为它比序列化步骤的性能要大得多。
Keep in mind that the perf measurements posted by Marc Gravell are testing the performance over 1,000,000 iterations.
请记住,Marc Gravell 发布的性能测量是测试超过 1,000,000 次迭代的性能。
What kind of database are you storing them in? Are the objects serialized in memory or straight to storage? How are they being sent to the db? How big are the objects? When one is updated, do you send all of the objects to the database, or just the one that has changed? Are you caching anything in memory at all, or re-reading from storage each time?
您将它们存储在什么样的数据库中?对象是在内存中序列化还是直接存储?它们如何被发送到数据库?对象有多大?当一个对象更新时,是将所有对象都发送到数据库,还是仅发送已更改的对象?您是否在内存中缓存任何内容,或者每次都从存储中重新读取?
回答by Charlie
I'm going to correct myself on this, Marc Gravall pointed out the first iteration has an overhead of bulding the model so I've done some tests taking the average of 1000 iterations of serialization and deserialization for both XML and binary. I tried my tests with the v2 of the Compact Framework DLL first, and then with the v3.5 DLL. Here's what I got, time is in ms:
我将对此进行更正,Marc Gravall 指出第一次迭代有构建模型的开销,所以我做了一些测试,对 XML 和二进制进行了平均 1000 次序列化和反序列化迭代。我首先使用 Compact Framework DLL 的 v2 进行了测试,然后使用了 v3.5 DLL。这是我得到的,时间以毫秒为单位:
.NET 2.0
================================ XML ====== Binary ===
Serialization 1st Iteration 3236 5508
Deserialization 1st Iteration 1501 318
Serialization Average 9.826 5.525
Deserialization Average 5.525 0.771
.NET 3.5
================================ XML ====== Binary ===
Serialization 1st Iteration 3307 5598
Deserialization 1st Iteration 1386 200
Serialization Average 10.923 5.605
Deserialization Average 5.605 0.279
回答by IanGilham
XML is often slow to process and takes up a lot of space. There have been a number of different attempts to tackle this, and the most popular today seems to be to just drop the lot in a gzip file, like with the Open Packaging Convention.
XML 通常处理缓慢并占用大量空间。已经有许多不同的尝试来解决这个问题,今天最流行的似乎是把很多东西放到一个 gzip 文件中,就像Open Packaging Convention 一样。
The W3Chas shown the gzip approach to be less than optimal, and they and various other groupshave been working on a better binary serialisation suitable for fast processing and compression, for transmission.
的W3C已经显示gzip的办法不是最佳的,并且它们和各种其它基团已经工作适合于快速处理和压缩更好的二进制序列,用于传输。
回答by Cr1spy
The main expense in your method is the actual generation of the XmlSerializer class. Creating the serialiser is a time consuming process which you should only do once for each object type. Try caching the serialisers and see if that improves performance at all.
您方法中的主要开销是 XmlSerializer 类的实际生成。创建序列化器是一个耗时的过程,对于每种对象类型您应该只执行一次。尝试缓存序列化程序,看看这是否能提高性能。
Following this advice I saw a large performance improvement in my app which allowed me to continute to use XML serialisation.
按照这个建议,我在我的应用程序中看到了很大的性能改进,这使我能够继续使用 XML 序列化。
Hope this helps.
希望这可以帮助。