C# 从文本文件中删除重复的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1245500/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove Duplicate Lines From Text File?
提问by Goober
Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.
给定文本行的输入文件,我希望识别和删除重复的行。请展示一个简单的 C# 片段来完成这个。
采纳答案by Jon Skeet
This should do (and will copy with large files).
这应该可以(并且将复制大文件)。
Note that it only removes duplicate consecutivelines, i.e.
请注意,它只会删除重复的连续行,即
a
b
b
c
b
d
will end up as
最终会变成
a
b
c
b
d
If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.
如果你不想在任何地方重复,你需要保留一组你已经看过的行。
using System;
using System.IO;
class DeDuper
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: DeDuper <input file> <output file>");
return;
}
using (TextReader reader = File.OpenText(args[0]))
using (TextWriter writer = File.CreateText(args[1]))
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
}
}
Note that this assumes Encoding.UTF8
, and that you want to use files. It's easy to generalize as a method though:
请注意,这假定Encoding.UTF8
, 并且您要使用文件。不过,它很容易概括为一种方法:
static void CopyLinesRemovingConsecutiveDupes
(TextReader reader, TextWriter writer)
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
(Note that that doesn't close anything - the caller should do that.)
(请注意,这不会关闭任何东西 - 调用者应该这样做。)
Here's a version that will remove allduplicates, rather than just consecutive ones:
这是一个将删除所有重复项而不仅仅是连续重复项的版本:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
}
回答by Darin Dimitrov
For small files:
对于小文件:
string[] lines = File.ReadAllLines("filename.txt");
File.WriteAllLines("filename.txt", lines.Distinct().ToArray());
回答by Kelly Gendron
For a long file (and non consecutive duplications) I'd copy the files line by line building a hash // position lookup table as I went.
对于长文件(和非连续重复),我会逐行复制文件,并在我进行时构建哈希 // 位置查找表。
As each line is copied check for the hashed value, if there is a collision double check that the line is the same and move to the next. (
复制每一行时,检查散列值,如果发生冲突,请仔细检查该行是否相同并移至下一行。(
Only worth it for fairly large files though.
不过,仅对于相当大的文件才值得。
回答by Steve
Here's a streaming approach that should incur less overhead than reading all unique strings into memory.
这是一种流方法,它应该比将所有唯一字符串读入内存产生更少的开销。
var sr = new StreamReader(File.OpenRead(@"C:\Temp\in.txt"));
var sw = new StreamWriter(File.OpenWrite(@"C:\Temp\out.txt"));
var lines = new HashSet<int>();
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
int hc = line.GetHashCode();
if(lines.Contains(hc))
continue;
lines.Add(hc);
sw.WriteLine(line);
}
sw.Flush();
sw.Close();
sr.Close();
回答by DeepakTheGeek
I am new to .net & have written something more simpler,may not be very efficient.Please fill free to share your thoughts.
我是 .net 的新手,写了一些更简单的东西,可能效率不高。请随意分享您的想法。
class Program
{
static void Main(string[] args)
{
string[] emp_names = File.ReadAllLines("D:\Employee Names.txt");
List<string> newemp1 = new List<string>();
for (int i = 0; i < emp_names.Length; i++)
{
newemp1.Add(emp_names[i]); //passing data to newemp1 from emp_names
}
for (int i = 0; i < emp_names.Length; i++)
{
List<string> temp = new List<string>();
int duplicate_count = 0;
for (int j = newemp1.Count - 1; j >= 0; j--)
{
if (emp_names[i] != newemp1[j]) //checking for duplicate records
temp.Add(newemp1[j]);
else
{
duplicate_count++;
if (duplicate_count == 1)
temp.Add(emp_names[i]);
}
}
newemp1 = temp;
}
string[] newemp = newemp1.ToArray(); //assigning into a string array
Array.Sort(newemp);
File.WriteAllLines("D:\Employee Names.txt", newemp); //now writing the data to a text file
Console.ReadLine();
}
}