C# 查找字符串的公共前缀

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2070356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 23:14:38  来源:igfitidea点击:

Find common prefix of strings

c#stringpattern-matching

提问by user251334

I am having 4 strings:

我有 4 个字符串:

"h:/a/b/c"
"h:/a/b/d"
"h:/a/b/e"
"h:/a/c"

I want to find the common prefix for those strings, i.e. "h:/a". How to find that?

我想找到这些字符串的公共前缀,即"h:/a". 怎么找呢?

Usually I'd split the string with delimiter '/'and put it in another list, and so on.
Is there any better way to do it?

通常我会用分隔符分割字符串'/'并将其放在另一个列表中,依此类推。
有没有更好的方法来做到这一点?

回答by John Feminella

This is the longest common substringproblem (although it's a slightly specialized case since you appear to only care about the prefix). There's no library implementation of the algorithm in the .NET platform that you can call directly, but the article linked here is chock-full of steps on how you'd do it yourself.

这是最长的常见子串问题(尽管这是一个稍微特殊的情况,因为您似乎只关心前缀)。在 .NET 平台中没有您可以直接调用的算法库实现,但此处链接的文章充满了有关您如何自己完成的步骤。

回答by Sam Holder

Just loop round the characters of the shortest string and compare each character to the character in the same position in the other strings. Whilst they all match keep going. As soon as one doesn't match then the string up to the current position -1 is the answer.

只需循环最短字符串的字符,并将每个字符与其他字符串中相同位置的字符进行比较。虽然他们都匹配继续进行。只要一个不匹配,那么直到当前位置 -1 的字符串就是答案。

Something like (pseudo code)

类似(伪代码)

int count=0;
foreach(char c in shortestString)
{
    foreach(string s in otherStrings)
    {
        if (s[count]!=c)
        {
             return shortestString.SubString(0,count-1); //need to check count is not 0 
        }
    }
    count+=1;
 }
 return shortestString;

回答by dtb

string[] xs = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" };

string x = string.Join("/", xs.Select(s => s.Split('/').AsEnumerable())
                              .Transpose()
                              .TakeWhile(s => s.All(d => d == s.First()))
                              .Select(s => s.First()));

with

public static IEnumerable<IEnumerable<T>> Transpose<T>(
    this IEnumerable<IEnumerable<T>> source)
{
    var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
    try
    {
        while (enumerators.All(e => e.MoveNext()))
        {
            yield return enumerators.Select(e => e.Current).ToArray();
        }
    }
    finally
    {
        Array.ForEach(enumerators, e => e.Dispose());
    }
}

回答by Simon D

Working code based on Sam Holder's solution (note it gives h:/a/ not h:/a as the longest common initial substring in the question):

基于 Sam Holder 解决方案的工作代码(注意它给出 h:/a/ 而不是 h:/a 作为问题中最长的公共初始子串):

using System;

namespace CommonPrefix
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(CommonPrefix(new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/c" })); // "h:/a/"
            Console.WriteLine(CommonPrefix(new[] { "abc", "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new[] { "abc" })); // "abc"
            Console.WriteLine(CommonPrefix(new string[] { })); // ""
            Console.WriteLine(CommonPrefix(new[] { "a", "abc" })); // "a"
            Console.WriteLine(CommonPrefix(new[] { "abc", "a" })); // "a"

            Console.ReadKey();
        }

        private static string CommonPrefix(string[] ss)
        {
            if (ss.Length == 0)
            {
                return "";
            }

            if (ss.Length == 1)
            {
                return ss[0];
            }

            int prefixLength = 0;

            foreach (char c in ss[0])
            {
                foreach (string s in ss)
                {
                    if (s.Length <= prefixLength || s[prefixLength] != c)
                    {
                        return ss[0].Substring(0, prefixLength);
                    }
                }
                prefixLength++;
            }

            return ss[0]; // all strings identical up to length of ss[0]
        }
    }
}

回答by Frank Schwieterman

I wanted a common string prefix, except I wanted to include any character (like /) and I did not want something performant/fancy just something I can read with tests. So I have this: https://github.com/fschwiet/DreamNJasmine/commit/ad802611ceacc673f2d03c30f7c8199f552b586f

我想要一个通用的字符串前缀,除了我想包含任何字符(如 /),而且我不想要一些性能/花哨的东西,只是我可以通过测试读取的东西。所以我有这个:https: //github.com/fschwiet/DreamNJasmine/commit/ad802611ceacc673f2d03c30f7c8199f552b586f

public class CommonStringPrefix
{
    public static string Of(IEnumerable<string> strings)
    {
        var commonPrefix = strings.FirstOrDefault() ?? "";

        foreach(var s in strings)
        {
            var potentialMatchLength = Math.Min(s.Length, commonPrefix.Length);

            if (potentialMatchLength < commonPrefix.Length)
                commonPrefix = commonPrefix.Substring(0, potentialMatchLength);

            for(var i = 0; i < potentialMatchLength; i++)
            {
                if (s[i] != commonPrefix[i])
                {
                    commonPrefix = commonPrefix.Substring(0, i);
                    break;
                }
            }
        }

        return commonPrefix;
    }
}

回答by user896993

Here is a custom implementation of the trie algorithm in c# (http://en.wikipedia.org/wiki/Trie). It is used to perform an indexed string via prefixes. This class has O(1) write and reads for leaf nodes. For prefix searches, the performance is O(log n), however the count of results for prefix is O(1).

这是在 c# ( http://en.wikipedia.org/wiki/Trie) 中trie 算法的自定义实现。它用于通过前缀执行索引字符串。这个类对叶节点有 O(1) 次写入和读取。对于前缀搜索,性能是 O(log n),但是前缀的结果计数是 O(1)。

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

public class StringIndex
{
    private Dictionary<char, Item> _rootChars;

    public StringIndex()
    {
        _rootChars = new Dictionary<char, Item>();
    }

    public void Add(string value, string data)
    {
        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
            }
            else
            {
                currentItem = new Item() { Level = level, Letter = c };
                currentChars.Add(c, currentItem);                
            }

            currentChars = currentItem.Items;

            level++;
        }

        if (!currentItem.Values.Contains(data))
        {
            currentItem.Values.Add(data);
            IncrementCount(value);
        }
    }

    private void IncrementCount(string value)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            currentItem = currentChars[c];
            currentItem.Total++;
            currentChars = currentItem.Items;
        }
    }

    public void Remove(string value, string data)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Dictionary<char, Item> parentChars = null;
        Item currentItem = null;

        foreach (char c in value)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                parentChars = currentChars;
                currentChars = currentItem.Items;
            }
            else
            {
                return; // no matches found
            }
        }

        if (currentItem.Values.Contains(data))
        {
            currentItem.Values.Remove(data);
            DecrementCount(value);
            if (currentItem.Total == 0)
            {
                parentChars.Remove(currentItem.Letter);
            }
        }
    }

    private void DecrementCount(string value)
    {
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in value)
        {
            currentItem = currentChars[c];
            currentItem.Total--;
            currentChars = currentItem.Items;
        }
    }

    public void Clear()
    {
        _rootChars.Clear();
    }

    public int GetValuesByPrefixCount(string prefix)
    {
        int valuescount = 0;

        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in prefix)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                currentChars = currentItem.Items;
            }
            else
            {
                return valuescount; // no matches found
            }
            level++;
        }

        valuescount = currentItem.Total;

        return valuescount;
    }

    public HashSet<string> GetValuesByPrefixFlattened(string prefix)
    {
        var results = GetValuesByPrefix(prefix);
        return new HashSet<string>(results.SelectMany(x => x));
    }

    public List<HashSet<string>> GetValuesByPrefix(string prefix)
    {
        var values = new List<HashSet<string>>();

        int level = 0;
        Dictionary<char, Item> currentChars = _rootChars;
        Item currentItem = null;

        foreach (char c in prefix)
        {
            if (currentChars.ContainsKey(c))
            {
                currentItem = currentChars[c];
                currentChars = currentItem.Items;
            }
            else
            {
                return values; // no matches found
            }
            level++;
        }

        ExtractValues(values, currentItem);

        return values;
    }

    public void ExtractValues(List<HashSet<string>> values, Item item)
    {
        foreach (Item subitem in item.Items.Values)
        {
            ExtractValues(values, subitem);
        }

        values.Add(item.Values);
    }

    public class Item
    {
        public int Level { get; set; }
        public char Letter { get; set; }
        public int Total { get; set; }
        public HashSet<string> Values { get; set; }
        public Dictionary<char, Item> Items { get; set; }

        public Item()
        {
            Values = new HashSet<string>();
            Items = new Dictionary<char, Item>();
        }
    }
}

Here is the unit testing & example code for how to use this class.

这是有关如何使用此类的单元测试和示例代码。

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;

    [TestClass]
    public class StringIndexTest
    {
        [TestMethod]
        public void AddAndSearchValues()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            var output = si.GetValuesByPrefixFlattened("abc");

            Assert.IsTrue(output.Contains("1") && output.Contains("2") && output.Contains("3"));
        }

        [TestMethod]
        public void RemoveAndSearch()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Remove("abcdef", "1");

            var output = si.GetValuesByPrefixFlattened("abc");

            Assert.IsTrue(!output.Contains("1") && output.Contains("2") && output.Contains("3"));
        }

        [TestMethod]
        public void Clear()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Clear();
            var output = si.GetValuesByPrefix("abc");

            Assert.IsTrue(output.Count == 0);
        }

        [TestMethod]
        public void AddAndSearchValuesCount()
        {

            var si = new StringIndex();

            si.Add("abcdef", "1");
            si.Add("abcdeff", "2");
            si.Add("abcdeffg", "3");
            si.Add("bcdef", "4");
            si.Add("bcdefg", "5");
            si.Add("cdefg", "6");
            si.Add("cdefgh", "7");

            si.Remove("cdefgh", "7");

            var output1 = si.GetValuesByPrefixCount("abc");
            var output2 = si.GetValuesByPrefixCount("b");
            var output3 = si.GetValuesByPrefixCount("bc");
            var output4 = si.GetValuesByPrefixCount("ca");

            Assert.IsTrue(output1 == 3 && output2 == 2 && output3 == 2 && output4 == 0);
        }
    }

Any suggestions on how to improve this class are welcome :)

欢迎任何关于如何改进这门课的建议:)

回答by Bart Calixto

My approach would be, take the first string. Get letter by letter while all the other string got the same letter on the same index position and stop if there is no match. Remove the last character if it's a separator.

我的方法是,取第一个字符串。逐个字母获取,而所有其他字符串在相同索引位置获得相同字母,如果没有匹配则停止。如果最后一个字符是分隔符,则删除它。

var str_array = new string[]{"h:/a/b/c",
         "h:/a/b/d",
         "h:/a/b/e",
         "h:/a/c"};
var separator = '/';

// get longest common prefix (optinally use ToLowerInvariant)
var ret = str_array.Any() 
    ? str_array.First().TakeWhile((s,i) => 
         str_array.All(e => 
            Char.ToLowerInvariant(s) == Char.ToLowerInvariant(e.Skip(i).Take(1).SingleOrDefault()))) 
    : String.Empty;

// remove last character if it's a separator (optional)
if (ret.LastOrDefault() == separator) 
    ret = ret.Take(ret.Count() -1);

string prefix = new String(ret.ToArray());

回答by Graham Morris

I needed to look for the longest common prefix in dissimilar strings. I came up with:

我需要在不同的字符串中寻找最长的公共前缀。我想出了:

private string FindCommonPrefix(List<string> list)
{
    List<string> prefixes = null;
    for (int len = 1; ; ++len)
    {
        var x = list.Where(s => s.Length >= len)
                    .GroupBy(c => c.Substring(0,len))
                    .Where(grp => grp.Count() > 1)
                    .Select(grp => grp.Key)
                    .ToList();

        if (!x.Any())
        {
            break;
        }
        //  Copy last list
        prefixes = new List<string>(x);
    }

    return prefixes == null ? string.Empty : prefixes.First();
}

If there is more than one prefix with the same length, it arbitrarily returns the first one found. Also it's case-sensitive. Both these points can be addressed by the reader!

如果有多个相同长度的前缀,则任意返回找到的第一个。它也是区分大小写的。这两点读者都可以解决!

回答by NER1808

I wrote this ICollection extension to find the longest common base Uri from a collection of web addresses.

我编写了这个 ICollection 扩展来从一组网址中找到最长的公共基本 Uri。

As it only check the collection of strings at every slash it will be a bit faster that a generic prefix routine (Not counting my inefficient algorithm!). It's verbose, but easy to follow...my favourite type of code ;-)

因为它只检查每个斜杠处的字符串集合,所以它比通用前缀例程要快一些(不算我的低效算法!)。它很冗长,但很容易理解……我最喜欢的代码类型 ;-)

Ignores 'http://' and 'https://', as well as case.

忽略“http://”和“https://”以及大小写。

    /// <summary>
    /// Resolves a common base Uri from a list of Uri strings. Ignores case. Includes the last slash
    /// </summary>
    /// <param name="collectionOfUriStrings"></param>
    /// <returns>Common root in lowercase</returns>
    public static string GetCommonUri(this ICollection<string> collectionOfUriStrings)
    {
        //Check that collection contains entries
        if (!collectionOfUriStrings.Any())
            return string.Empty;
        //Check that the first is no zero length
        var firstUri = collectionOfUriStrings.FirstOrDefault();
        if(string.IsNullOrEmpty(firstUri))
            return string.Empty;

        //set starting position to be passed '://'
        int previousSlashPos = firstUri.IndexOf("://", StringComparison.OrdinalIgnoreCase) + 2;
        int minPos = previousSlashPos + 1; //results return must have a slash after this initial position
        int nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
        //check if any slashes
        if (nextSlashPos == -1)
            return string.Empty;

        do
        {               
            string common = firstUri.Substring(0, nextSlashPos + 1);
            //check against whole collection
            foreach (var collectionOfUriString in collectionOfUriStrings)
            {
                if (!collectionOfUriString.StartsWith(common, StringComparison.OrdinalIgnoreCase))
                {
                    //return as soon as a mismatch is found
                    return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty ;
                }
            }
            previousSlashPos = nextSlashPos;                
            nextSlashPos = firstUri.IndexOf("/", previousSlashPos + 1, StringComparison.OrdinalIgnoreCase);
        } while (nextSlashPos != -1);

        return previousSlashPos > minPos ? firstUri.Substring(0, previousSlashPos + 1).ToLower() : string.Empty;
    } 

回答by Yegor

A short LINQy solution of mine.

我的一个简短的 LINQy 解决方案。

var samples = new[] { "h:/a/b/c", "h:/a/b/d", "h:/a/b/e", "h:/a/e" };

var commonPrefix = new string(
    samples.First().Substring(0, samples.Min(s => s.Length))
    .TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());