C# 将字符串拆分为行的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1508203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:16:02  来源:igfitidea点击:

Best way to split string into lines

c#stringsyntaxmultiline

提问by Konstantin Spirin

How do you split multi-line string into lines?

你如何将多行字符串拆分成行?

I know this way

我知道这样

var result = input.Split("\n\r".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

looks a bit ugly and loses empty lines. Is there a better solution?

看起来有点难看并且丢失了空行。有更好的解决方案吗?

采纳答案by Konrad Rudolph

  • If it looks ugly, just remove the unnecessary ToCharArraycall.

  • If you want to split by either \nor \r, you've got two options:

    • Use an array literal –?but this will give you empty lines for Windows-style line endings \r\n:

      var result = text.Split(new [] { '\r', '\n' });
      
    • Use a regular expression, as indicated by Bart:

      var result = Regex.Split(text, "\r\n|\r|\n");
      
  • If you want to preserve empty lines, why do you explicitly tell C# to throw them away? (StringSplitOptionsparameter) – use StringSplitOptions.Noneinstead.

  • 如果它看起来很难看,只需删除不必要的ToCharArray调用。

  • 如果您想按\n或进行拆分\r,您有两个选择:

    • 使用数组文字 –? 但这将为 Windows 样式的行结尾提供空行\r\n

      var result = text.Split(new [] { '\r', '\n' });
      
    • 使用正则表达式,如 Bart 所示:

      var result = Regex.Split(text, "\r\n|\r|\n");
      
  • 如果要保留空行,为什么要明确告诉 C# 将它们丢弃?(StringSplitOptions参数) –StringSplitOptions.None改用。

回答by Bart Kiers

You could use Regex.Split:

您可以使用 Regex.Split:

string[] tokens = Regex.Split(input, @"\r?\n|\r");

Edit: added |\rto account for (older) Mac line terminators.

编辑:添加|\r以考虑(较旧的)Mac 行终止符。

回答by Jonas Elfstr?m

If you want to keep empty lines just remove the StringSplitOptions.

如果您想保留空行,只需删除 StringSplitOptions。

var result = input.Split(System.Environment.NewLine.ToCharArray());

回答by JDunkerley

Slightly twisted, but an iterator block to do it:

稍微扭曲,但是一个迭代器块来做到这一点:

public static IEnumerable<string> Lines(this string Text)
{
    int cIndex = 0;
    int nIndex;
    while ((nIndex = Text.IndexOf(Environment.NewLine, cIndex + 1)) != -1)
    {
        int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
        yield return Text.Substring(sIndex, nIndex - sIndex);
        cIndex = nIndex;
    }
    yield return Text.Substring(cIndex + 1);
}

You can then call:

然后你可以调用:

var result = input.Lines().ToArray();

回答by Hyman

using (StringReader sr = new StringReader(text)) {
    string line;
    while ((line = sr.ReadLine()) != null) {
        // do something
    }
}

回答by MAG TOR

string[] lines = input.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);

回答by orad

Update: See herefor an alternative/async solution.

更新:请参阅此处了解替代/异步解决方案。



This works great and is faster than Regex:

这很好用,并且比 Regex 更快:

input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)

It is important to have "\r\n"first in the array so that it's taken as one line break. The above gives the same results as either of these Regex solutions:

"\r\n"数组中的第一个很重要,以便将其视为一个换行符。以上给出了与这些正则表达式解决方案中的任何一个相同的结果:

Regex.Split(input, "\r\n|\r|\n")

Regex.Split(input, "\r?\n|\r")

Except that Regex turns out to be about 10 times slower. Here's my test:

除了 Regex 慢了大约 10 倍。这是我的测试:

Action<Action> measure = (Action func) => {
    var start = DateTime.Now;
    for (int i = 0; i < 100000; i++) {
        func();
    }
    var duration = DateTime.Now - start;
    Console.WriteLine(duration);
};

var input = "";
for (int i = 0; i < 100; i++)
{
    input += "1 \r2\r\n3\n4\n\r5 \r\n\r\n 6\r7\r 8\r\n";
}

measure(() =>
    input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
);

measure(() =>
    Regex.Split(input, "\r\n|\r|\n")
);

measure(() =>
    Regex.Split(input, "\r?\n|\r")
);

Output:

输出:

00:00:03.8527616

00:00:03.8527616

00:00:31.8017726

00:00:31.8017726

00:00:32.5557128

00:00:32.5557128

and here's the Extension Method:

这是扩展方法:

public static class StringExtensionMethods
{
    public static IEnumerable<string> GetLines(this string str, bool removeEmptyLines = false)
    {
        return str.Split(new[] { "\r\n", "\r", "\n" },
            removeEmptyLines ? StringSplitOptions.RemoveEmptyEntries : StringSplitOptions.None);
    }
}

Usage:

用法:

input.GetLines()      // keeps empty lines

input.GetLines(true)  // removes empty lines

回答by John Thompson

    private string[] GetLines(string text)
    {

        List<string> lines = new List<string>();
        using (MemoryStream ms = new MemoryStream())
        {
            StreamWriter sw = new StreamWriter(ms);
            sw.Write(text);
            sw.Flush();

            ms.Position = 0;

            string line;

            using (StreamReader sr = new StreamReader(ms))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    lines.Add(line);
                }
            }
            sw.Close();
        }



        return lines.ToArray();
    }

回答by orad

I had this other answerbut this one, based on Hyman's answer, is significantly fastermight be preferred since it works asynchronously, although slightly slower.

我有另一个答案,但基于 Hyman 的答案,这个答案明显更快,可能更受欢迎,因为它异步工作,尽管速度稍慢。

public static class StringExtensionMethods
{
    public static IEnumerable<string> GetLines(this string str, bool removeEmptyLines = false)
    {
        using (var sr = new StringReader(str))
        {
            string line;
            while ((line = sr.ReadLine()) != null)
            {
                if (removeEmptyLines && String.IsNullOrWhiteSpace(line))
                {
                    continue;
                }
                yield return line;
            }
        }
    }
}

Usage:

用法:

input.GetLines()      // keeps empty lines

input.GetLines(true)  // removes empty lines

Test:

测试:

Action<Action> measure = (Action func) =>
{
    var start = DateTime.Now;
    for (int i = 0; i < 100000; i++)
    {
        func();
    }
    var duration = DateTime.Now - start;
    Console.WriteLine(duration);
};

var input = "";
for (int i = 0; i < 100; i++)
{
    input += "1 \r2\r\n3\n4\n\r5 \r\n\r\n 6\r7\r 8\r\n";
}

measure(() =>
    input.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None)
);

measure(() =>
    input.GetLines()
);

measure(() =>
    input.GetLines().ToList()
);

Output:

输出:

00:00:03.9603894

00:00:03.9603894

00:00:00.0029996

00:00:00.0029996

00:00:04.8221971

00:00:04.8221971

回答by Glenn Slayden

It's tricky to handle mixedline endings properly. As we know, the line termination characters can be "Line Feed" (ASCII 10, \n, \x0A, \u000A), "Carriage Return" (ASCII 13, \r, \x0D, \u000D), or some combination of them. Going back to DOS, Windows uses the two-character sequence CR-LF \u000D\u000A, so this combination should only emit a single line. Unix uses a single \u000A, and very old Macs used a single \u000Dcharacter. The standard way to treat arbitrary mixtures of these characters within a single text file is as follows:

正确处理混合行结尾很棘手。正如我们所知,行终止符可以是“换行符”(ASCII 10、\n\x0A\u000A)、“回车”(ASCII 13、\r\x0D\u000D)或它们的某种组合。回到 DOS,Windows 使用两个字符的序列 CR-LF \u000D\u000A,所以这个组合应该只发出一行。Unix 使用单个\u000A,而非常老的 Mac 使用单个\u000D字符。在单个文本文件中处理这些字符的任意混合的标准方法如下:

  • each and every CR or LF character should skip to the next line EXCEPT...
  • ...if a CR is immediately followed by LF (\u000D\u000A) then these two togetherskip just one line.
  • String.Emptyis the only input that returns no lines (any character entails at least one line)
  • The last line must be returned even if it has neither CR nor LF.
  • 每个 CR 或 LF 字符都应该跳到下一行,除了...
  • ...如果 CR 后紧跟 LF ( \u000D\u000A) 那么这两个一起跳过一行。
  • String.Empty是唯一不返回任何行的输入(任何字符至少需要一行)
  • 即使最后一行既没有 CR 也没有 LF,也必须返回。

The preceding rule describes the behavior of StringReader.ReadLineand related functions, and the function shown below produces identical results. It is an efficient C#line breaking function that dutifully implements these guidelines to correctly handle any arbitrary sequence or combination of CR/LF. The enumerated lines do not contain any CR/LF characters. Empty lines are preserved and returned as String.Empty.

前面的规则描述了StringReader.ReadLine和相关函数的行为,下面显示的函数产生了相同的结果。它是一个高效的C#换行函数,尽职尽责地实现了这些准则,以正确处理 CR/LF 的任意序列或组合。枚举行不包含任何 CR/LF 字符。空行被保留并作为 返回String.Empty

/// <summary>
/// Enumerates the text lines from the string.
///   ? Mixed CR-LF scenarios are handled correctly
///   ? String.Empty is returned for each empty line
///   ? No returned string ever contains CR or LF
/// </summary>
public static IEnumerable<String> Lines(this String s)
{
    int j = 0, c, i;
    char ch;
    if ((c = s.Length) > 0)
        do
        {
            for (i = j; (ch = s[j]) != '\r' && ch != '\n' && ++j < c;)
                ;

            yield return s.Substring(i, j - i);
        }
        while (++j < c && (ch != '\r' || s[j] != '\n' || ++j < c));
}

Note: If you don't mind the overhead of creating a StringReaderinstance on each call, you can use the following C# 7code instead. As noted, while the example above may be slightly more efficient, both of these functions produce the exact same results.

注意:如果您不介意StringReader在每次调用时创建实例的开销,您可以改用以下C# 7代码。如上所述,虽然上面的示例可能效率更高一些,但这两个函数会产生完全相同的结果。

public static IEnumerable<String> Lines(this String s)
{
    using (var tr = new StringReader(s))
        while (tr.ReadLine() is String L)
            yield return L;
}