在 C# 中逐行读取文件

Question

提问by Luca Spiller

I am trying to read some text files, where each line needs to be processed. At the moment I am just using a StreamReader, and then reading each line individually.

我正在尝试读取一些文本文件，其中每一行都需要处理。目前我只是使用 StreamReader，然后单独阅读每一行。

I am wondering whether there is a more efficient way (in terms of LoC and readability) to do this using LINQ without compromising operational efficiency. The examples I have seen involve loading the whole file into memory, and then processing it. In this case however I don't believe that would be very efficient. In the first example the files can get up to about 50k, and in the second example, not all lines of the file need to be read (sizes are typically < 10k).

我想知道是否有更有效的方法（在 LoC 和可读性方面）使用 LINQ 来做到这一点，而不会影响操作效率。我所看到的示例涉及将整个文件加载到内存中，然后对其进行处理。但是，在这种情况下，我认为这不会非常有效。在第一个示例中，文件可以达到大约 50k，而在第二个示例中，不需要读取文件的所有行（大小通常 < 10k）。

You could argue that nowadays it doesn't really matter for these small files, however I believe that sort of the approach leads to inefficient code.

您可能会争辩说，现在这些小文件并不重要，但是我相信这种方法会导致代码效率低下。

First example:

第一个例子：

// Open file
using(var file = System.IO.File.OpenText(_LstFilename))
{
    // Read file
    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Ignore empty lines
        if (line.Length > 0)
        {
            // Create addon
            T addon = new T();
            addon.Load(line, _BaseDir);

            // Add to collection
            collection.Add(addon);
        }
    }
}

Second example:

第二个例子：

// Open file
using (var file = System.IO.File.OpenText(datFile))
{
    // Compile regexs
    Regex nameRegex = new Regex("IDENTIFY (.*)");

    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Check name
        Match m = nameRegex.Match(line);
        if (m.Success)
        {
            _Name = m.Groups[1].Value;

            // Remove me when other values are read
            break;
        }
    }
}

Answer 1

采纳答案by Marc Gravell

You can write a LINQ-based line reader pretty easily using an iterator block:

您可以使用迭代器块非常轻松地编写基于 LINQ 的行阅读器：

static IEnumerable<SomeType> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            SomeType newRecord = /* parse line */
            yield return newRecord;
        }
    }
}

or to make Jon happy:

或者让乔恩开心：

static IEnumerable<string> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            yield return line;
        }
    }
}
...
var typedSequence = from line in ReadFrom(path)
                    let record = ParseLine(line)
                    where record.Active // for example
                    select record.Key;

then you have ReadFrom(...)as a lazily evaluated sequence without buffering, perfect for Whereetc.

那么你有ReadFrom(...)一个懒惰评估的序列，没有缓冲，非常适合Where等。

Note that if you use OrderByor the standard GroupBy, it will have to buffer the data in memory; ifyou need grouping and aggregation, "PushLINQ" has some fancy code to allow you to perform aggregations on the data but discard it (no buffering). Jon's explanation is here.

请注意，如果您使用OrderBy或标准GroupBy，则必须在内存中缓冲数据；如果你需要分组和聚合，“PushLINQ”有一些花哨的代码，允许你对数据执行聚合但丢弃它（无缓冲）。乔恩的解释在这里。

Answer 2

回答by Jon Skeet

It's simpler to read a line and check whether or not it's null than to check for EndOfStream all the time.

读取一行并检查它是否为空比一直检查 EndOfStream 更简单。

However, I also have a LineReaderclass in MiscUtilwhich makes all of this a lot simpler - basically it exposes a file (or a Func<TextReader>as an IEnumerable<string>which lets you do LINQ stuff over it. So you can do things like:

但是，我LineReader在MiscUtil 中也有一个类，它使所有这些都变得更简单 - 基本上它公开一个文件（或 a Func<TextReader>as anIEnumerable<string>允许您对其进行 LINQ 操作。因此您可以执行以下操作：

var query = from file in Directory.GetFiles("*.log")
            from line in new LineReader(file)
            where line.Length > 0
            select new AddOn(line); // or whatever

The heart of LineReaderis this implementation of IEnumerable<string>.GetEnumerator:

的核心LineReader是这个实现IEnumerable<string>.GetEnumerator：

public IEnumerator<string> GetEnumerator()
{
    using (TextReader reader = dataSource())
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

Almost all the rest of the source is just giving flexible ways of setting up dataSource(which is a Func<TextReader>).

几乎所有其余的源代码都只是提供了灵活的设置方式dataSource（这是一个Func<TextReader>）。

Answer 3

回答by k?e?m?p? ?

NOTE: You need to watch out for the IEnumerable<T>solution, as it will result in the file being open for the duration of processing.

注意：您需要注意IEnumerable<T>解决方案，因为它会导致文件在处理期间打开。

For example, with Marc Gravell's response:

例如，Marc Gravell 的回应：

foreach(var record in ReadFrom("myfile.csv")) {
    DoLongProcessOn(record);
}

the file will remain open for the whole of the processing.

该文件将在整个处理过程中保持打开状态。

Answer 4

回答by Luca Spiller

Thanks all for your answers! I decided to go with a mixture, mainly focusing on Marc's though as I will only need to read lines from a file. I guess you could argue seperation is needed everywhere, but heh, life is too short!

谢谢大家的回答！我决定混合使用，主要关注 Marc 的，因为我只需要从文件中读取行。我想你可能会争辩说任何地方都需要分离，但是嘿，生命太短暂了！

Regarding the keeping the file open, that isn't going to be an issue in this case, as the code is part of a desktop application.

关于保持文件打开，在这种情况下这不会成为问题，因为代码是桌面应用程序的一部分。

Lastly I noticed you all used lowercase string. I know in Java there is a difference between capitalised and non capitalised string, but I thought in C# lowercase string was just a reference to capitalised String?

最后我注意到你们都使用小写字符串。我知道在 Java 中大写和非大写字符串之间存在差异，但我认为在 C# 中小写字符串只是对大写字符串的引用？

public void Load(AddonCollection<T> collection)
{
    // read from file
    var query =
        from line in LineReader(_LstFilename)
        where line.Length > 0
        select CreateAddon(line);

    // add results to collection
    collection.AddRange(query);
}

protected T CreateAddon(String line)
{
    // create addon
    T addon = new T();
    addon.Load(line, _BaseDir);

    return addon;
}

protected static IEnumerable<String> LineReader(String fileName)
{
    String line;
    using (var file = System.IO.File.OpenText(fileName))
    {
        // read each line, ensuring not null (EOF)
        while ((line = file.ReadLine()) != null)
        {
            // return trimmed line
            yield return line.Trim();
        }
    }
}

Answer 5

回答by user7610

Since .NET 4.0, the File.ReadLines()method is available.

从 .NET 4.0 开始，该File.ReadLines()方法可用。

int count = File.ReadLines(filepath).Count(line => line.StartsWith(">"));

在 C# 中逐行读取文件

提问by Luca Spiller

采纳答案by Marc Gravell

回答by Jon Skeet

回答by k?e?m?p? ?

回答by Luca Spiller

回答by user7610

相关推荐

最近更新

标签

在 C# 中逐行读取文件

提问by Luca Spiller

采纳答案by Marc Gravell

回答by Jon Skeet

回答by k?e?m?p? ?

回答by Luca Spiller

回答by user7610

相关推荐

Linux 如何将参数从终端传递给函数

C# 如何附加到表达式

Linux 如何在不使用任何可移动磁盘的情况下将文件从主机操作系统传输到虚拟机中运行的操作系统？

C# DataGridView 显示行标题单元格

相关推荐

最近更新

标签