C# 从字符串中删除所有非 ASCII 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1522884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:23:03  来源:igfitidea点击:

Remove all non-ASCII characters from string

c#ascii

提问by user135498

I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file. The source file seems to have a few non-ASCII characters that are fouling up the processing routine.

我有一个 C# 例程,它从 CSV 文件导入数据,将其与数据库进行匹配,然后将其重写到文件中。源文件似乎有一些非 ASCII 字符,这些字符扰乱了处理程序。

I already have a static method that I run each input field through but it performs basic checks like removing commas and quotes. Does anybody know how I could add functionality that removes non-ASCII characters too?

我已经有一个静态方法可以运行每个输入字段,但它执行基本检查,例如删除逗号和引号。有人知道我如何添加删除非 ASCII 字符的功能吗?

采纳答案by EToreo

string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))

回答by Jonas Elfstr?m

It sounds kind of strange that it's accepted to drop the non-ASCII.

放弃非 ASCII 是可以接受的,这听起来有点奇怪。

Also I always recommend the excellent FileHelperslibrary for parsing CSV-files.

另外,我总是推荐优秀的FileHelpers库来解析 CSV 文件。

回答by Eric J.

If you wanted to test a specific character, you could use

如果你想测试一个特定的字符,你可以使用

if ((int)myChar <= 127)

Just getting the ASCII encoding of the string will not tell you that a specific character was non-ASCII to begin with (if you care about that). See MSDN.

仅获取字符串的 ASCII 编码不会告诉您特定字符开始时是非 ASCII 字符(如果您关心的话)。请参阅MSDN

回答by Jaider

Here a simple solution:

这里有一个简单的解决方案:

public static bool IsASCII(this string value)
{
    // ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there
    return Encoding.UTF8.GetByteCount(value) == value.Length;
}

source: http://snipplr.com/view/35806/

来源:http: //snipplr.com/view/35806/

回答by Ross Kelly

    public string RunCharacterCheckASCII(string s)
    {
        string str = s;
        bool is_find = false;
        char ch;
        int ich = 0;
        try
        {
            char[] schar = str.ToCharArray();
            for (int i = 0; i < schar.Length; i++)
            {
                ch = schar[i];
                ich = (int)ch;
                if (ich > 127) // not ascii or extended ascii
                {
                    is_find = true;
                    schar[i] = '?';
                }
            }
            if (is_find)
                str = new string(schar);
        }
        catch (Exception ex)
        {
        }
        return str;
    }

回答by paparazzo

Do it all at once

一次性完成

public string ReturnCleanASCII(string s)
{
    StringBuilder sb = new StringBuilder(s.Length);
    foreach(char c in s)
    {
       if((int)c > 127) // you probably don't want 127 either
          continue;
       if((int)c < 32)  // I bet you don't want control characters 
          continue;
       if(c == ',')
          continue;
       if(c == '"')
          continue;
       sb.Append(c);
    }
    return sb.ToString();
}

回答by rookie1024

Here's an improvement upon the accepted answer:

这是对接受的答案的改进:

string fallbackStr = "";

Encoding enc = Encoding.GetEncoding(Encoding.ASCII.CodePage,
  new EncoderReplacementFallback(fallbackStr),
  new DecoderReplacementFallback(fallbackStr));

string cleanStr = enc.GetString(enc.GetBytes(inputStr));

This method will replace unknown characters with the value of fallbackStr, or if fallbackStris empty, leave them out entirely. (Note that enccan be defined outside the scope of a function.)

此方法将用 的值替换未知字符fallbackStr,如果fallbackStr为空,则将它们完全排除。(请注意,enc可以在函数范围之外定义。)