C# 使用 StringWriter 进行 XML 序列化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1564718/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:55:32  来源:igfitidea点击:

Using StringWriter for XML Serialization

c#sql-serverxmlutf-8xml-serialization

提问by StampedeXV

I'm currently searching for an easy way to serialize objects (in C# 3).

我目前正在寻找一种简单的方法来序列化对象(在 C# 3 中)。

I googled some examples and came up with something like:

我用谷歌搜索了一些例子,并提出了类似的东西:

MemoryStream memoryStream = new MemoryStream ( );
XmlSerializer xs = new XmlSerializer ( typeof ( MyObject) );
XmlTextWriter xmlTextWriter = new XmlTextWriter ( memoryStream, Encoding.UTF8 );
xs.Serialize ( xmlTextWriter, myObject);
string result = Encoding.UTF8.GetString(memoryStream .ToArray());

After reading this questionI asked myself, why not using StringWriter? It seems much easier.

看完这个问题后,我问自己,为什么不使用 StringWriter?似乎容易多了。

XmlSerializer ser = new XmlSerializer(typeof(MyObject));
StringWriter writer = new StringWriter();
ser.Serialize(writer, myObject);
serializedValue = writer.ToString();

Another Problem was, that the first example generated XML I could not just write into an XML column of SQL Server 2005 DB.

另一个问题是,第一个示例生成的 XML 我不能只写入 SQL Server 2005 DB 的 XML 列。

The first question is: Is there a reason why I shouldn't use StringWriter to serialize an Object when I need it as a string afterwards? I never found a result using StringWriter when googling.

第一个问题是:当我以后需要将对象作为字符串时,为什么我不应该使用 StringWriter 来序列化对象?我在谷歌搜索时从未使用 StringWriter 找到结果。

The second is, of course: If you should not do it with StringWriter (for whatever reasons), which would be a good and correct way?

第二个当然是:如果你不应该用 StringWriter 来做(无论出于什么原因),这将是一个好的和正确的方法?



Addition:

添加:

As it was already mentioned by both answers, I'll further go into the XML to DB problem.

正如两个答案中已经提到的那样,我将进一步探讨 XML 到 DB 的问题。

When writing to the Database I got the following exception:

写入数据库时​​,出现以下异常:

System.Data.SqlClient.SqlException: XML parsing: line 1, character 38, unable to switch the encoding

System.Data.SqlClient.SqlException: XML 解析:第 1 行,字符 38,无法切换编码

For string

对于字符串

<?xml version="1.0" encoding="utf-8"?><test/>

I took the string created from the XmlTextWriter and just put as xml there. This one did not work (neither with manual insertion into the DB).

我把从 XmlTextWriter 创建的字符串作为 xml 放在那里。这个不起作用(手动插入数据库都不起作用)。

Afterwards I tried manual insertion (just writing INSERT INTO ... ) with encoding="utf-16" which also failed. Removing the encoding totally worked then. After that result I switched back to the StringWriter code and voila - it worked.

之后,我尝试使用 encoding="utf-16" 手动插入(只是编写 INSERT INTO ... ),但也失败了。然后删除编码完全有效。在那个结果之后,我切换回 StringWriter 代码,瞧——它起作用了。

Problem: I don't really understand why.

问题:我真的不明白为什么。

at Christian Hayter: With those tests I'm not sure that I have to use utf-16 to write to the DB. Wouldn't setting the encoding to UTF-16 (in the xml tag) work then?

在 Christian Hayter:通过这些测试,我不确定是否必须使用 utf-16 来写入数据库。那么将编码设置为 UTF-16(在 xml 标签中)行不通?

采纳答案by Solomon Rutzky

<TL;DR>The problem is rather simple, actually: you are not matching the declared encoding (in the XML declaration) with the datatype of the input parameter. If you manually added <?xml version="1.0" encoding="utf-8"?><test/>to the string, then declaring the SqlParameterto be of type SqlDbType.Xmlor SqlDbType.NVarCharwould give you the "unable to switch the encoding" error. Then, when inserting manually via T-SQL, since you switched the declared encoding to be utf-16, you were clearly inserting a VARCHARstring (not prefixed with an upper-case "N", hence an 8-bit encoding, such as UTF-8) and not an NVARCHARstring (prefixed with an upper-case "N", hence the 16-bit UTF-16 LE encoding).

<TL;DR>问题相当简单,实际上:您没有将声明的编码(在 XML 声明中)与输入参数的数据类型相匹配。如果您手动添加<?xml version="1.0" encoding="utf-8"?><test/>到字符串,则声明SqlParameter为类型SqlDbType.XmlSqlDbType.NVarChar会给您“无法切换编码”错误。然后,当通过 T-SQL 手动插入时,由于您将声明的编码切换为utf-16,您显然是在插入一个VARCHAR字符串(不以大写“N”为前缀,因此是 8 位编码,例如 UTF-8)而不是NVARCHAR字符串(以大写“N”为前缀,因此是 16 位 UTF-16 LE 编码)。

The fix should have been as simple as:

修复应该很简单:

  1. In the first case, when adding the declaration stating encoding="utf-8": simply don't add the XML declaration.
  2. In the second case, when adding the declaration stating encoding="utf-16": either
    1. simply don't add the XML declaration, OR
    2. simply add an "N" to the input parameter type: SqlDbType.NVarCharinstead of SqlDbType.VarChar:-) (or possibly even switch to using SqlDbType.Xml)
  1. 在第一种情况下,在添加声明时encoding="utf-8":只需不要添加 XML 声明。
  2. 在第二种情况下,当添加声明时encoding="utf-16"
    1. 只是不要添加 XML 声明,或者
    2. 只需在输入参数类型中添加一个“N”:SqlDbType.NVarChar而不是: -) SqlDbType.VarChar(甚至可能切换到 using SqlDbType.Xml

(Detailed response is below)

(详细回复如下)



All of the answers here are over-complicated and unnecessary (regardless of the 121 and 184 up-votes for Christian's and Jon's answers, respectively). They might provide working code, but none of them actually answer the question. The issue is that nobody truly understood the question, which ultimately is about how the XML datatype in SQL Server works. Nothing against those two clearly intelligent people, but this question has little to nothing to do with serializing to XML. Saving XML data into SQL Server is much easier than what is being implied here.

这里的所有答案都过于复杂和不必要(无论克里斯蒂安和乔恩的答案分别有 121 和 184 票赞成)。他们可能会提供工作代码,但实际上没有人回答这个问题。问题是没有人真正理解这个问题,这最终是关于 SQL Server 中的 XML 数据类型如何工作的。对这两个显然很聪明的人没有任何意见,但是这个问题与序列化到 XML 几乎没有关系。将 XML 数据保存到 SQL Server 中比此处暗示的要容易得多。

It doesn't really matter how the XML is produced as long as you follow the rules of how to create XML data in SQL Server. I have a more thorough explanation (including working example code to illustrate the points outlined below) in an answer on this question: How to solve “unable to switch the encoding” error when inserting XML into SQL Server, but the basics are:

只要您遵循如何在 SQL Server 中创建 XML 数据的规则,XML 是如何生成的并不重要。我在这个问题的答案中有一个更彻底的解释(包括工作示例代码来说明下面概述的要点):How to solve “unable to switch the encoding” error when inserting XML into SQL Server,但基本知识是:

  1. The XML declaration is optional
  2. The XML datatype stores strings always as UCS-2 / UTF-16 LE
  3. If your XML is UCS-2 / UTF-16 LE, then you:
    1. pass in the data as either NVARCHAR(MAX)or XML/ SqlDbType.NVarChar(maxsize = -1) or SqlDbType.Xml, or if using a string literal then it must be prefixed with an upper-case "N".
    2. if specifying the XML declaration, it must be either "UCS-2" or "UTF-16" (no real difference here)
  4. If your XML is 8-bit encoded (e.g. "UTF-8" / "iso-8859-1" / "Windows-1252"), then you:
    1. need to specify the XML declaration IF the encoding is different than the code page specified by the default Collation of the database
    2. you must pass in the data as VARCHAR(MAX)/ SqlDbType.VarChar(maxsize = -1), or if using a string literal then it must notbe prefixed with an upper-case "N".
    3. Whatever 8-bit encoding is used, the "encoding" noted in the XML declaration must match the actual encoding of the bytes.
    4. The 8-bit encoding will be converted into UTF-16 LE by the XML datatype
  1. XML 声明是可选的
  2. XML 数据类型始终将字符串存储为 UCS-2 / UTF-16 LE
  3. 如果您的 XML 是 UCS-2 / UTF-16 LE,那么您:
    1. 将数据作为NVARCHAR(MAX)or XML/ SqlDbType.NVarChar(maxsize = -1) or传递SqlDbType.Xml,或者如果使用字符串文字,则它必须以大写“N”为前缀。
    2. 如果指定 XML 声明,它必须是“UCS-2”或“UTF-16”(这里没有真正的区别)
  4. 如果您的 XML 是 8 位编码(例如“UTF-8”/“iso-8859-1”/“Windows-1252”),那么您:
    1. 如果编码与数据库默认排序规则指定的代码页不同,则需要指定 XML 声明
    2. 您必须将数据作为VARCHAR(MAX)/ SqlDbType.VarChar(maxsize = -1)传入,或者如果使用字符串文字,则它不能以大写字母“N”为前缀。
    3. 无论使用何种 8 位编码,XML 声明中注明的“编码”必须与字节的实际编码相匹配。
    4. 8 位编码将通过 XML 数据类型转换为 UTF-16 LE

With the points outlined above in mind, andgiven that strings in .NET are alwaysUTF-16 LE / UCS-2 LE (there is no difference between those in terms of encoding), we can answer your questions:

考虑到上述要点,考虑到 .NET 中的字符串始终是UTF-16 LE / UCS-2 LE(它们之间在编码方面没有区别),我们可以回答您的问题:

Is there a reason why I shouldn't use StringWriter to serialize an Object when I need it as a string afterwards?

当我以后需要它作为字符串时,是否有理由不使用 StringWriter 来序列化对象?

No, your StringWritercode appears to be just fine (at least I see no issues in my limited testing using the 2nd code block from the question).

不,您的StringWriter代码似乎很好(至少我在使用问题中的第二个代码块进行的有限测试中没有发现任何问题)。

Wouldn't setting the encoding to UTF-16 (in the xml tag) work then?

那么将编码设置为 UTF-16(在 xml 标签中)行不通?

It isn't necessary to provide the XML declaration. When it is missing, the encoding is assumed to be UTF-16 LE ifyou pass the string into SQL Server as NVARCHAR(i.e. SqlDbType.NVarChar) or XML(i.e. SqlDbType.Xml). The encoding is assumed to be the default 8-bit Code Page if passing in as VARCHAR(i.e. SqlDbType.VarChar). If you have any non-standard-ASCII characters (i.e. values 128 and above) and are passing in as VARCHAR, then you will likely see "?" for BMP characters and "??" for Supplementary Characters as SQL Server will convert the UTF-16 string from .NET into an 8-bit string of the current Database's Code Page before converting it back into UTF-16 / UCS-2. But you shouldn't get any errors.

没有必要提供 XML 声明。当它丢失时,如果您将字符串作为NVARCHAR(ie SqlDbType.NVarChar) 或XML(ie SqlDbType.Xml)传递到 SQL Server,则假定编码为 UTF-16 LE 。如果传入 as VARCHAR(即SqlDbType.VarChar),则假定编码为默认的 8 位代码页。如果您有任何非标准 ASCII 字符(即 128 及以上的值)并传入 as VARCHAR,那么您可能会看到“?” 用于 BMP 字符和“??” 对于补充字符,SQL Server 会将 .NET 中的 UTF-16 字符串转换为当前数据库代码页的 8 位字符串,然后再将其转换回 UTF-16 / UCS-2。但是你不应该得到任何错误。

On the other hand, if you do specify the XML declaration, then you mustpass into SQL Server using the matching 8-bit or 16-bit datatype. So if you have a declaration stating that the encoding is either UCS-2 or UTF-16, then you mustpass in as SqlDbType.NVarCharor SqlDbType.Xml. Or, if you have a declaration stating that the encoding is one of the 8-bit options (i.e. UTF-8, Windows-1252, iso-8859-1, etc), then you mustpass in as SqlDbType.VarChar. Failure to match the declared encoding with the proper 8 or 16 -bit SQL Server datatype will result in the "unable to switch the encoding" error that you were getting.

另一方面,如果确实指定了 XML 声明,则必须使用匹配的 8 位或 16 位数据类型传递到 SQL Server。因此,如果您有声明指出编码是 UCS-2 或 UTF-16,那么您必须传入 asSqlDbType.NVarCharSqlDbType.Xml。或者,如果你有一个声明,表示编码是8位的选项之一(即UTF-8Windows-1252iso-8859-1等等),那么你必须在为合格SqlDbType.VarChar。未能将声明的编码与正确的 8 位或 16 位 SQL Server 数据类型匹配将导致您收到“无法切换编码”错误。

For example, using your StringWriter-based serialization code, I simply printed the resulting string of the XML and used it in SSMS. As you can see below, the XML declaration is included (because StringWriterdoes not have an option to OmitXmlDeclarationlike XmlWriterdoes), which poses no problem so long as you pass the string in as the correct SQL Server datatype:

例如,使用StringWriter基于您的序列化代码,我只是打印了 XML 的结果字符串并在 SSMS 中使用它。正如您在下面看到的,包含了 XML 声明(因为StringWriter没有选项OmitXmlDeclarationlike XmlWriterdo),只要您将字符串作为正确的 SQL Server 数据类型传入,就不会出现问题:

-- Upper-case "N" prefix == NVARCHAR, hence no error:
DECLARE @Xml XML = N'<?xml version="1.0" encoding="utf-16"?>
<string>Test ?</string>';
SELECT @Xml;
-- <string>Test ?</string>

As you can see, it even handles characters beyond standard ASCII, given that ?is BMP Code Point U+1234, and is Supplementary Character Code Point U+1F638. However, the following:

如您所见,它甚至可以处理超出标准 ASCII 的字符,即?BMP 代码点 U+1234 和补充字符代码点 U+1F638。但是,以下内容:

-- No upper-case "N" prefix on the string literal, hence VARCHAR:
DECLARE @Xml XML = '<?xml version="1.0" encoding="utf-16"?>
<string>Test ?</string>';

results in the following error:

导致以下错误:

Msg 9402, Level 16, State 1, Line XXXXX
XML parsing: line 1, character 39, unable to switch the encoding


Ergo, all of that explanation aside, the full solution to your original question is:

因此,抛开所有这些解释,您原始问题的完整解决方案是:

You were clearly passing the string in as SqlDbType.VarChar. Switch to SqlDbType.NVarCharand it will work without needing to go through the extra step of removing the XML declaration. This is preferred over keeping SqlDbType.VarCharand removing the XML declaration because this solution will prevent data loss when the XML includes non-standard-ASCII characters. For example:

您显然将字符串作为SqlDbType.VarChar. 切换到SqlDbType.NVarChar它,无需执行删除 XML 声明的额外步骤即可工作。这比保留SqlDbType.VarChar和删除 XML 声明更可取,因为当 XML 包含非标准 ASCII 字符时,此解决方案将防止数据丢失。例如:

-- No upper-case "N" prefix on the string literal == VARCHAR, and no XML declaration:
DECLARE @Xml2 XML = '<string>Test ?</string>';
SELECT @Xml2;
-- <string>Test ???</string>

As you can see, there is no error this time, but now there is data-loss

© 2020 版权所有