C#:从字符串中删除常见的无效字符:改进此算法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1329961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 15:13:36  来源:igfitidea点击:

C#: Removing common invalid characters from a string: improve this algorithm

c#.netalgorithm

提问by p.campbell

Consider the requirement to strip invalid characters from a string. The characters just need to be removed and replace with blank or string.Empty.

考虑从字符串中去除无效字符的要求。只需要删除字符并替换为空白或string.Empty.

char[] BAD_CHARS = new char[] { '!', '@', '#', '$', '%', '_' }; //simple example

foreach (char bad in BAD_CHARS)
{
    if (someString.Contains(bad))
      someString = someString.Replace(bad.ToString(), string.Empty);
}

I'd have really likedto do this:

真的很想这样做:

if (BAD_CHARS.Any(bc => someString.Contains(bc)))
    someString.Replace(bc,string.Empty); // bc is out of scope

Question:Do you have any suggestions on refactoring this algoritm, or any simpler, easier to read, performant, maintainable algorithms?

问题:你对重构这个算法有什么建议,或者任何更简单、更容易阅读、高性能、可维护的算法?

采纳答案by Rune FS

char[] BAD_CHARS = new char[] { '!', '@', '#', '$', '%', '_' }; //simple example
someString = string.Concat(someString.Split(BAD_CHARS,StringSplitOptions.RemoveEmptyEntries));

should do the trick (sorry for any smaller syntax errors I'm on my phone)

应该可以解决问题(对于我在手机上出现的任何较小的语法错误,我深表歉意)

回答by Adam Robinson

Why would you have REALLY LIKED to do that? The code is absolutely no simpler, you're just forcing a query extension method into your code.

为什么你真的很喜欢这样做?代码绝对不简单,您只是将查询扩展方法强加到您的代码中。

As an aside, the Containscheck seems redundant, both conceptually and from a performance perspective. Containshas to run through the whole string anyway, you may as well just call Replace(bad.ToString(), string.Empty)for every character and forget about whether or not it's actually present.

顺便说Contains一句,无论从概念上还是从性能角度来看,检查似乎都是多余的。Contains无论如何都必须遍历整个字符串,您也可以只调用Replace(bad.ToString(), string.Empty)每个字符而忘记它是否实际存在。

Of course, a regular expression is always an option, and may be more performant (if not less clear) in a situation like this.

当然,正则表达式始终是一种选择,并且在这种情况下可能会更高效(如果不是不太清楚)。

回答by Noldorin

The stringclass is immutable (although a reference type), hence all its static methods are designed to return a newstringvariable. Calling someString.Replacewithout assigning it to anything will not have any effect in your program.- Seems like you fixed this problem.

string类是不可变的(虽然引用类型),因此,它的所有的静态方法被设计为返回一个新的string变量。在someString.Replace不将其分配给任何东西的情况下调用不会对您的程序产生任何影响。- 好像你解决了这个问题。

The main issue with your suggested algorithm is that it repeatedly assigning many new stringvariables, potentially causing a big performance hit. LINQ doesn't really help things here. (I doesn't make the code significantly shorter and certainly not any more readable, in my opinion.)

您建议的算法的主要问题是它重复分配许多新string变量,可能会导致性能下降。LINQ 在这里并没有真正的帮助。(在我看来,我没有让代码明显更短,当然也没有任何可读性。)

Try the following extension method. The key is the use of StringBuilder, which means only one block of memory is assigned for the result during execution.

试试下面的扩展方法。关键是使用了StringBuilder,这意味着在执行过程中只为结果分配一块内存。

private static readonly HashSet<char> badChars = 
    new HashSet<char> { '!', '@', '#', '$', '%', '_' };

public static string CleanString(this string str)
{
    var result = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        if (!badChars.Contains(str[i]))
            result.Append(str[i]);
    }
    return result.ToString();
}

This algorithm also makes use of the .NET 3.5 'HashSet' class to give O(1)look up time for detecting a bad char. This makes the overall algorithm O(n)rather than the O(nm)of your posted one (mbeing the number of bad chars); it also is lot a better with memory usage, as explained above.

该算法还利用 .NET 3.5 'HashSet' 类O(1)为检测错误字符提供查找时间。这使得整体算法O(n)而不是O(nm)您发布的算法(m即坏字符的数量);如上所述,它在内存使用方面也更好。

回答by CAbbott

I don't know about the readability of it, but a regular expression could do what you need it to:

我不知道它的可读性,但正则表达式可以做你需要它做的事情:

someString = Regex.Replace(someString, @"[!@#$%_]", "");

回答by Jeff Mitchell

Something to consider -- if this is for passwords (say), you want to scan for and keep good characters, and assume everything else is bad. Its easier to correctly filter or good things, then try to guess all bad things.

需要考虑的事情——如果这是密码(比如说),你想扫描并保留好的字符,并假设其他一切都是坏的。正确过滤或好的东西更容易,然后尝试猜测所有坏的东西。

For Each Character If Character is Good -> Keep it (copy to out buffer, whatever.)

对于每个字符如果字符好 -> 保留它(复制到输出缓冲区,无论如何。)

jeff

杰夫

回答by Mike Jacobs

if you still want to do it in a LINQy way:

如果你仍然想以 LINQy 的方式来做:

public static string CleanUp(this string orig)
{
    var badchars = new HashSet<char>() { '!', '@', '#', '$', '%', '_' };

    return new string(orig.Where(c => !badchars.Contains(c)).ToArray());
}

回答by Sam Harwell

This one isfaster than HashSet<T>. Also, if you have to perform this action often, please consider the foundations for this question I asked here.

这一次比快HashSet<T>。另外,如果您必须经常执行此操作,请考虑我在此处提出的这个问题的基础。

private static readonly bool[] BadCharValues;

static StaticConstructor()
{
    BadCharValues = new bool[char.MaxValue+1];
    char[] badChars = { '!', '@', '#', '$', '%', '_' };
    foreach (char c in badChars)
        BadCharValues[c] = true;
}

public static string CleanString(string str)
{
    var result = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        if (!BadCharValues[str[i]])
            result.Append(str[i]);
    }
    return result.ToString();
}

回答by Jason Kleban

This is pretty clean. Restricts it to valid characters instead of removing invalid ones. You should split it to constants probably:

这很干净。将其限制为有效字符而不是删除无效字符。您可能应该将其拆分为常量:

string clean = new string(@"Sour!ce Str&*(@ing".Where(c => 
@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,.".Contains(c)).ToArray()

回答by slimburrok

Extra tip: If you don't want to remember the array of charthat are invalid for Files, you could use Path.GetInvalidFileNameChars(). If you wanted it for Paths, it's Path.GetInvalidPathChars

额外提示:如果您不想记住char对文件无效的数组,则可以使用Path.GetInvalidFileNameChars(). 如果你想要路径,它是Path.GetInvalidPathChars

private static string RemoveInvalidChars(string str)
            {
                return string.Concat(str.Split(Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries));
            }