C# 删除字符串中分隔符之间的文本(使用正则表达式?)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1359412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 15:36:34  来源:igfitidea点击:

Remove text in-between delimiters in a string (using a regex?)

c#.netregexalgorithmstring

提问by p.campbell

Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well asthose characters/delimiters.

考虑需要找到一对匹配的字符集,并删除它们之间的任何字符,以及那些字符/分隔符。

Here are the sets of delimiters:

以下是定界符集:

 []    square brackets
 ()    parentheses
 ""    double quotes
 ''    single quotes

Here are some examples of strings that should match:

以下是一些应该匹配的字符串示例:

 Given:                       Results In:
-------------------------------------------
 Hello "some" World           Hello World
 Give [Me Some] Purple        Give Purple
 Have Fifteen (Lunch Today)   Have Fifteen
 Have 'a good'day             Have day

And some examples of strings that should not match:

以及一些不应匹配的字符串示例:

 Does Not Match:
------------------
 Hello "world
 Brown]co[w
 Cheese'factory

If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"), that'd be an edge case that we can ignore here.

如果给定的字符串不包含一组匹配的分隔符,则不会对其进行修改。输入字符串可能有许多匹配的分隔符对。如果一组 2 个分隔符重叠(即he[llo "worl]d"),那将是我们可以在这里忽略的边缘情况。

The algorithm would look something like this:

算法看起来像这样:

string myInput = "Give [Me Some] Purple (And More) Elephants";
string pattern; //some pattern
string output = Regex.Replace(myInput, pattern, string.Empty);

Question:How would you achieve this with C#? I am leaning towards a regex.

问题:您将如何使用 C# 实现这一目标?我倾向于使用正则表达式。

Bonus:Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.

奖励:是否有简单的方法来匹配常量或某种列表中的开始和结束分隔符?我正在寻找的解决方案很容易更改分隔符,以防业务分析师提出新的分隔符集。

采纳答案by Kelsey

Simple regex would be:

简单的正则表达式是:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\[.*\])|(\".*\")|('.*')|(\(.*\))";
string output = Regex.Replace(input, regex, "");

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

至于以自定义方式构建正则表达式,您只需要构建部件:

('.*')  // example of the single quote check

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

然后将每个单独的正则表达式部分与一个 OR(正则表达式中的 |)连接,就像我原来的例子一样。构建正则表达式字符串后,只需运行一次即可。关键是将正则表达式纳入单个检查,因为对一个项目执行多个正则表达式匹配,然后遍历大量项目可能会导致性能显着下降。

In my first example that would take the place of the following line:

在我的第一个示例中,它将代替以下行:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.

我相信有人会发布一个很酷的 linq 表达式来构建基于要匹配的分隔符对象数组或其他东西的正则表达式。

回答by Alexis Abril

I have to add the old adage, "You have a problem and you want to use regular expressions. Now you have two problems."

我必须补充一句古老的格言,“你有一个问题,你想使用正则表达式。现在你有两个问题。”

I've come up with a quick regex that will hopefully help you in the direction you are looking:

我想出了一个快速的正则表达式,希望能帮助你朝着你正在寻找的方向前进:

[.]*(\(|\[|\"|').*(\]|\)|\"|')[.]*

The parenthesis, brackets, double quotes are escaped while the single quote is able to be left alone.

括号、方括号、双引号被转义,而单引号可以单独留下。

To put the above expression into English, I'm allowing for any number of characters before and any number after, matching the expression in between matching delimiters.

要将上述表达式转换为英语,我允许在前面和后面的任意数量的字符,匹配匹配分隔符之间的表达式。

The open delimiter phrase is (\(|\[|\"|')This has a matching closing phrase. To make this a bit more extensible in the future, you could remove the actual delimiters and contain them in a config file, database or wherever you may choose.

开放定界符短语是(\(|\[|\"|')This has a匹配的结束短语。为了在将来使其更具可扩展性,您可以删除实际的分隔符并将它们包含在配置文件、数据库或您可以选择的任何地方。

回答by Bryan Menard

A simple way would be to do this:

一个简单的方法是这样做:

string RemoveBetween(string s, char begin, char end)
{
    Regex regex = new Regex(string.Format("\{0}.*?\{1}", begin, end));
    return regex.Replace(s, string.Empty);
}

string s = "Give [Me Some] Purple (And More) \Elephants/ and .hats^";
s = RemoveBetween(s, '(', ')');
s = RemoveBetween(s, '[', ']');
s = RemoveBetween(s, '\', '/');
s = RemoveBetween(s, '.', '^');

Changing the return statement to the following will avoid duplicate empty spaces:

将 return 语句更改为以下内容将避免重复空格:

return new Regex(" +").Replace(regex.Replace(s, string.Empty), " ");

The final result for this would be:

最终结果将是:

"Give Purple and "

Disclamer: A single regex would probably faster than this.

免责声明:单个正则表达式可能比这更快。

回答by jaxxbo

Use the following Regex

使用以下正则表达式

(\{\S*\})

What this regex does is it replaces any occurences of {word} with the modifiedWord you want to replace it with.

这个正则表达式的作用是将 {word} 的任何出现替换为您想要替换的修改词。

Some sample c# code:

一些示例 C# 代码:

 static readonly Regex re = new Regex(@"(\{\S*\})", RegexOptions.Compiled);
        /// <summary>
        /// Pass text and collection of key/value pairs. The text placeholders will be substituted with the collection values.
        /// </summary>
        /// <param name="text">Text that containes placeholders such as {fullname}</param>
        /// <param name="fields">a collection of key values pairs. Pass <code>fullname</code> and the value <code>Sarah</code>. 
        /// DO NOT PASS keys with curly brackets <code>{}</code> in the collection.</param>
        /// <returns>Substituted Text</returns>
        public static string ReplaceMatch(this string text, StringDictionary fields)
        {
            return re.Replace(text, match => fields[match.Groups[1].Value]);
        }

In a sentence such as

在一个句子中,例如

Regex Hero is a real-time {online {Silverlight} Regular} Expression Tester.

Regex Hero 是一个实时{online { Silverlight} Regular} 表达式测试器。

It will replace only {Silverlight} and not starting from first { bracket to the last } bracket.

它将仅替换 { Silverlight} 而不是从第一个 { 括号到最后一个 } 括号开始。

回答by H?kon Selj?sen

Building on Bryan Menard's regular expression, I made an extension method which will also work for nested replacements like "[Test 1 [[Test2] Test3]] Hello World":

基于Bryan Menard 的正则表达式,我创建了一个扩展方法,该方法也适用于嵌套替换,例如“[Test 1 [[Test2] Test3]] Hello World”:

    /// <summary>
    /// Method used to remove the characters betweeen certain letters in a string. 
    /// </summary>
    /// <param name="rawString"></param>
    /// <param name="enter"></param>
    /// <param name="exit"></param>
    /// <returns></returns>
    public static string RemoveFragmentsBetween(this string rawString, char enter, char exit) 
    {
        if (rawString.Contains(enter) && rawString.Contains(exit))
        {
            int substringStartIndex = rawString.IndexOf(enter) + 1;
            int substringLength = rawString.LastIndexOf(exit) - substringStartIndex;

            if (substringLength > 0 && substringStartIndex > 0)
            {
                string substring = rawString.Substring(substringStartIndex, substringLength).RemoveFragmentsBetween(enter, exit);
                if (substring.Length != substringLength) // This would mean that letters have been removed
                {
                    rawString = rawString.Remove(substringStartIndex, substringLength).Insert(substringStartIndex, substring).Trim();
                }
            }

            //Source: https://stackoverflow.com/a/1359521/3407324
            Regex regex = new Regex(String.Format("\{0}.*?\{1}", enter, exit));
            return new Regex(" +").Replace(regex.Replace(rawString, string.Empty), " ").Trim(); // Removing duplicate and tailing/leading spaces
        }
        else
        {
            return rawString;
        }
    }

Usage of this method would in the suggested case look like this:

在建议的情况下,此方法的用法如下所示:

string testString = "[Test 1 [[Test2] Test3]] Hello World";
testString.RemoveFragmentsBetween('[',']');

Returning the string "Hello World".

返回字符串“Hello World”。