C#:从字符串中删除常见的无效字符:改进此算法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1329961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C#: Removing common invalid characters from a string: improve this algorithm
提问by p.campbell
Consider the requirement to strip invalid characters from a string. The characters just need to be removed and replace with blank or string.Empty
.
考虑从字符串中去除无效字符的要求。只需要删除字符并替换为空白或string.Empty
.
char[] BAD_CHARS = new char[] { '!', '@', '#', '$', '%', '_' }; //simple example
foreach (char bad in BAD_CHARS)
{
if (someString.Contains(bad))
someString = someString.Replace(bad.ToString(), string.Empty);
}
I'd have really likedto do this:
我真的很想这样做:
if (BAD_CHARS.Any(bc => someString.Contains(bc)))
someString.Replace(bc,string.Empty); // bc is out of scope
Question:Do you have any suggestions on refactoring this algoritm, or any simpler, easier to read, performant, maintainable algorithms?
问题:你对重构这个算法有什么建议,或者任何更简单、更容易阅读、高性能、可维护的算法?
采纳答案by Rune FS
char[] BAD_CHARS = new char[] { '!', '@', '#', '$', '%', '_' }; //simple example
someString = string.Concat(someString.Split(BAD_CHARS,StringSplitOptions.RemoveEmptyEntries));
should do the trick (sorry for any smaller syntax errors I'm on my phone)
应该可以解决问题(对于我在手机上出现的任何较小的语法错误,我深表歉意)
回答by Adam Robinson
Why would you have REALLY LIKED to do that? The code is absolutely no simpler, you're just forcing a query extension method into your code.
为什么你真的很喜欢这样做?代码绝对不简单,您只是将查询扩展方法强加到您的代码中。
As an aside, the Contains
check seems redundant, both conceptually and from a performance perspective. Contains
has to run through the whole string anyway, you may as well just call Replace(bad.ToString(), string.Empty)
for every character and forget about whether or not it's actually present.
顺便说Contains
一句,无论从概念上还是从性能角度来看,检查似乎都是多余的。Contains
无论如何都必须遍历整个字符串,您也可以只调用Replace(bad.ToString(), string.Empty)
每个字符而忘记它是否实际存在。
Of course, a regular expression is always an option, and may be more performant (if not less clear) in a situation like this.
当然,正则表达式始终是一种选择,并且在这种情况下可能会更高效(如果不是不太清楚)。
回答by Noldorin
The - Seems like you fixed this problem. string
class is immutable (although a reference type), hence all its static methods are designed to return a newstring
variable. Calling someString.Replace
without assigning it to anything will not have any effect in your program.
的- 好像你解决了这个问题。string
类是不可变的(虽然引用类型),因此,它的所有的静态方法被设计为返回一个新的string
变量。在someString.Replace
不将其分配给任何东西的情况下调用不会对您的程序产生任何影响。
The main issue with your suggested algorithm is that it repeatedly assigning many new string
variables, potentially causing a big performance hit. LINQ doesn't really help things here. (I doesn't make the code significantly shorter and certainly not any more readable, in my opinion.)
您建议的算法的主要问题是它重复分配许多新string
变量,可能会导致性能下降。LINQ 在这里并没有真正的帮助。(在我看来,我没有让代码明显更短,当然也没有任何可读性。)
Try the following extension method. The key is the use of StringBuilder
, which means only one block of memory is assigned for the result during execution.
试试下面的扩展方法。关键是使用了StringBuilder
,这意味着在执行过程中只为结果分配一块内存。
private static readonly HashSet<char> badChars =
new HashSet<char> { '!', '@', '#', '$', '%', '_' };
public static string CleanString(this string str)
{
var result = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
if (!badChars.Contains(str[i]))
result.Append(str[i]);
}
return result.ToString();
}
This algorithm also makes use of the .NET 3.5 'HashSet' class to give O(1)
look up time for detecting a bad char. This makes the overall algorithm O(n)
rather than the O(nm)
of your posted one (m
being the number of bad chars); it also is lot a better with memory usage, as explained above.
该算法还利用 .NET 3.5 'HashSet' 类O(1)
为检测错误字符提供查找时间。这使得整体算法O(n)
而不是O(nm)
您发布的算法(m
即坏字符的数量);如上所述,它在内存使用方面也更好。
回答by CAbbott
I don't know about the readability of it, but a regular expression could do what you need it to:
我不知道它的可读性,但正则表达式可以做你需要它做的事情:
someString = Regex.Replace(someString, @"[!@#$%_]", "");
回答by Jeff Mitchell
Something to consider -- if this is for passwords (say), you want to scan for and keep good characters, and assume everything else is bad. Its easier to correctly filter or good things, then try to guess all bad things.
需要考虑的事情——如果这是密码(比如说),你想扫描并保留好的字符,并假设其他一切都是坏的。正确过滤或好的东西更容易,然后尝试猜测所有坏的东西。
For Each Character If Character is Good -> Keep it (copy to out buffer, whatever.)
对于每个字符如果字符好 -> 保留它(复制到输出缓冲区,无论如何。)
jeff
杰夫
回答by Mike Jacobs
if you still want to do it in a LINQy way:
如果你仍然想以 LINQy 的方式来做:
public static string CleanUp(this string orig)
{
var badchars = new HashSet<char>() { '!', '@', '#', '$', '%', '_' };
return new string(orig.Where(c => !badchars.Contains(c)).ToArray());
}
回答by Sam Harwell
This one isfaster than HashSet<T>
. Also, if you have to perform this action often, please consider the foundations for this question I asked here.
这一次是比快HashSet<T>
。另外,如果您必须经常执行此操作,请考虑我在此处提出的这个问题的基础。
private static readonly bool[] BadCharValues;
static StaticConstructor()
{
BadCharValues = new bool[char.MaxValue+1];
char[] badChars = { '!', '@', '#', '$', '%', '_' };
foreach (char c in badChars)
BadCharValues[c] = true;
}
public static string CleanString(string str)
{
var result = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
if (!BadCharValues[str[i]])
result.Append(str[i]);
}
return result.ToString();
}
回答by Jason Kleban
This is pretty clean. Restricts it to valid characters instead of removing invalid ones. You should split it to constants probably:
这很干净。将其限制为有效字符而不是删除无效字符。您可能应该将其拆分为常量:
string clean = new string(@"Sour!ce Str&*(@ing".Where(c =>
@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,.".Contains(c)).ToArray()
回答by slimburrok
Extra tip: If you don't want to remember the array of char
that are invalid for Files, you could use Path.GetInvalidFileNameChars()
. If you wanted it for Paths, it's Path.GetInvalidPathChars
额外提示:如果您不想记住char
对文件无效的数组,则可以使用Path.GetInvalidFileNameChars()
. 如果你想要路径,它是Path.GetInvalidPathChars
private static string RemoveInvalidChars(string str)
{
return string.Concat(str.Split(Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries));
}