C# 将 Unicode 字符串转换为转义的 ASCII 字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1615559/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert a Unicode string to an escaped ASCII string
提问by Ali
How can I convert this string:
如何转换此字符串:
This string contains the Unicode character Pi(π)
into an escaped ASCII string:
转换为转义的 ASCII 字符串:
This string contains the Unicode character Pi(\u03a0)
and vice versa?
和反之亦然?
The current Encoding available in C# converts the π character to "?". I need to preserve that character.
C# 中可用的当前编码将 π 字符转换为“?”。我需要保留那个性格。
采纳答案by Adam Sills
This goes back and forth to and from the \uXXXX format.
这在 \uXXXX 格式之间来回切换。
class Program {
static void Main( string[] args ) {
string unicodeString = "This function contains a unicode character pi (\u03a0)";
Console.WriteLine( unicodeString );
string encoded = EncodeNonAsciiCharacters(unicodeString);
Console.WriteLine( encoded );
string decoded = DecodeEncodedNonAsciiCharacters( encoded );
Console.WriteLine( decoded );
}
static string EncodeNonAsciiCharacters( string value ) {
StringBuilder sb = new StringBuilder();
foreach( char c in value ) {
if( c > 127 ) {
// This character is too big for ASCII
string encodedValue = "\u" + ((int) c).ToString( "x4" );
sb.Append( encodedValue );
}
else {
sb.Append( c );
}
}
return sb.ToString();
}
static string DecodeEncodedNonAsciiCharacters( string value ) {
return Regex.Replace(
value,
@"\u(?<Value>[a-zA-Z0-9]{4})",
m => {
return ((char) int.Parse( m.Groups["Value"].Value, NumberStyles.HexNumber )).ToString();
} );
}
}
Outputs:
输出:
This function contains a unicode character pi (π)
此函数包含一个 unicode 字符 pi (π)
This function contains a unicode character pi (\u03a0)
此函数包含一个 unicode 字符 pi (\u03a0)
This function contains a unicode character pi (π)
此函数包含一个 unicode 字符 pi (π)
回答by JeffFerguson
You need to use the Convert()
method in the Encoding
class:
您需要使用类中的Convert()
方法Encoding
:
- Create an
Encoding
object that represents ASCII encoding - Create an
Encoding
object that represents Unicode encoding - Call
Encoding.Convert()
with the source encoding, the destination encoding, and the string to be encoded
- 创建一个
Encoding
表示 ASCII 编码的对象 - 创建一个
Encoding
代表Unicode编码的对象 Encoding.Convert()
使用源编码、目标编码和要编码的字符串调用
There is an example here:
有一个例子在这里:
using System;
using System.Text;
namespace ConvertExample
{
class ConvertExampleClass
{
static void Main()
{
string unicodeString = "This string contains the unicode character Pi(\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
}
}
}
回答by leppie
string StringFold(string input, Func<char, string> proc)
{
return string.Concat(input.Select(proc).ToArray());
}
string FoldProc(char input)
{
if (input >= 128)
{
return string.Format(@"\u{0:x4}", (int)input);
}
return input.ToString();
}
string EscapeToAscii(string input)
{
return StringFold(input, FoldProc);
}
回答by jdecuyper
class Program
{
static void Main(string[] args)
{
char[] originalString = "This string contains the unicode character Pi(π)".ToCharArray();
StringBuilder asAscii = new StringBuilder(); // store final ascii string and Unicode points
foreach (char c in originalString)
{
// test if char is ascii, otherwise convert to Unicode Code Point
int cint = Convert.ToInt32(c);
if (cint <= 127 && cint >= 0)
asAscii.Append(c);
else
asAscii.Append(String.Format("\u{0:x4} ", cint).Trim());
}
Console.WriteLine("Final string: {0}", asAscii);
Console.ReadKey();
}
}
All non-ASCII chars are converted to their Unicode Code Point representation and appended to the final string.
所有非 ASCII 字符都转换为其 Unicode 代码点表示形式并附加到最终字符串。
回答by Remy Lebeau
To store actual Unicode codepoints, you have to first decode the String's UTF-16 codeunits to UTF-32 codeunits (which are currently the same as the Unicode codepoints). Use System.Text.Encoding.UTF32.GetBytes()
for that, and then write the resulting bytes to the StringBuilder
as needed,i.e.
要存储实际的 Unicode 代码点,您必须首先将字符串的 UTF-16 代码单元解码为 UTF-32 代码单元(目前与 Unicode 代码点相同)。System.Text.Encoding.UTF32.GetBytes()
为此使用,然后StringBuilder
根据需要将结果字节写入,即
static void Main(string[] args)
{
String originalString = "This string contains the unicode character Pi(π)";
Byte[] bytes = Encoding.UTF32.GetBytes(originalString);
StringBuilder asAscii = new StringBuilder();
for (int idx = 0; idx < bytes.Length; idx += 4)
{
uint codepoint = BitConverter.ToUInt32(bytes, idx);
if (codepoint <= 127)
asAscii.Append(Convert.ToChar(codepoint));
else
asAscii.AppendFormat("\u{0:x4}", codepoint);
}
Console.WriteLine("Final string: {0}", asAscii);
Console.ReadKey();
}
回答by vovafeldman
A small patch to @Adam Sills's answer which solves FormatException
on cases where the input string like "c:\u00ab\otherdirectory\" plus RegexOptions.Compiled
makes the Regex
compilation much faster:
@Adam Sills 的答案的一个小补丁,它解决FormatException
了输入字符串(如“c:\u00ab\otherdirectory\”)加上RegexOptions.Compiled
使Regex
编译速度更快的情况:
private static Regex DECODING_REGEX = new Regex(@"\u(?<Value>[a-fA-F0-9]{4})", RegexOptions.Compiled);
private const string PLACEHOLDER = @"#!#";
public static string DecodeEncodedNonAsciiCharacters(this string value)
{
return DECODING_REGEX.Replace(
value.Replace(@"\", PLACEHOLDER),
m => {
return ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString(); })
.Replace(PLACEHOLDER, @"\");
}
回答by Douglas
As a one-liner:
作为单线:
var result = Regex.Replace(input, @"[^\x00-\x7F]", c =>
string.Format(@"\u{0:x4}", (int)c.Value[0]));
回答by MrRolling
For UnescapeYou can simply use this functions:
对于Unescape,您可以简单地使用以下功能:
System.Text.RegularExpressions.Regex.Unescape(string)
System.Uri.UnescapeDataString(string)
I suggest using this method (It works better with UTF-8):
我建议使用这种方法(使用 UTF-8 效果更好):
UnescapeDataString(string)
回答by Bill Barry
Here is my current implementation:
这是我目前的实现:
public static class UnicodeStringExtensions
{
public static string EncodeNonAsciiCharacters(this string value) {
var bytes = Encoding.Unicode.GetBytes(value);
var sb = StringBuilderCache.Acquire(value.Length);
bool encodedsomething = false;
for (int i = 0; i < bytes.Length; i += 2) {
var c = BitConverter.ToUInt16(bytes, i);
if ((c >= 0x20 && c <= 0x7f) || c == 0x0A || c == 0x0D) {
sb.Append((char) c);
} else {
sb.Append($"\u{c:x4}");
encodedsomething = true;
}
}
if (!encodedsomething) {
StringBuilderCache.Release(sb);
return value;
}
return StringBuilderCache.GetStringAndRelease(sb);
}
public static string DecodeEncodedNonAsciiCharacters(this string value)
=> Regex.Replace(value,/*language=regexp*/@"(?:\u[a-fA-F0-9]{4})+", Decode);
static readonly string[] Splitsequence = new [] { "\u" };
private static string Decode(Match m) {
var bytes = m.Value.Split(Splitsequence, StringSplitOptions.RemoveEmptyEntries)
.Select(s => ushort.Parse(s, NumberStyles.HexNumber)).SelectMany(BitConverter.GetBytes).ToArray();
return Encoding.Unicode.GetString(bytes);
}
}
This passes a test:
这通过了一个测试:
public void TestBigUnicode() {
var s = "\U00020000";
var encoded = s.EncodeNonAsciiCharacters();
var decoded = encoded.DecodeEncodedNonAsciiCharacters();
Assert.Equals(s, decoded);
}
with the encoded value: "\ud840\udc00"
使用编码值: "\ud840\udc00"
This implementation makes use of a StringBuilderCache(reference source link)
此实现使用StringBuilderCache(参考源链接)