C# Tokenizer - 保留分隔符

Question

提问by Ipster

I am working on porting code from JAVA to C#, and part of the JAVA code uses tokenizer - but it is my understanding that the resulting array from the stringtokenizer in Java will also have the separators (in this case +, -, /, *, (, )) as tokens. I have attempted to use the C# Split() function, but it seems to eliminate the separators themselves. In the end, this will parse a string and run it as a calculation. I have done a lot of research, and have not found any references on the topic.

我正在将代码从 JAVA 移植到 C#，并且部分 JAVA 代码使用了分词器——但我的理解是，Java 中 stringtokenizer 的结果数组也将有分隔符（在这种情况下，+、-、/、* , (, )) 作为标记。我曾尝试使用 C# Split() 函数，但它似乎消除了分隔符本身。最后，这将解析一个字符串并将其作为计算运行。我做了很多研究，但没有找到任何关于该主题的参考。

Does anyone know how to get the actual separators, in the order they were encountered, to be in the split array?

有谁知道如何按照遇到的顺序将实际的分隔符放入拆分数组中？

Code for token-izing:

标记化代码：

public CalcLexer(String s)
{
    char[] seps = {'\t','\n','\r','+','-','*','/','(',')'};
    tokens = s.Split(seps);
    advance();
}

Testing:

测试：

static void Main(string[] args)
    {
        CalcLexer myCalc = new CalcLexer("24+3");
        Console.ReadLine();
    }

The "24+3" would result in the following output: "24", "3" I am looking for an output of "24", "+", "3"

“24+3”将导致以下输出：“24”、“3” 我正在寻找“24”、“+”、“3”的输出

In the nature of full disclosure, this project is part of a class assignment, and uses the following complete source code:

在完全公开的性质下，该项目是课堂作业的一部分，并使用以下完整源代码：

http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcParser.java.txt http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcLexer.java.txt

http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcParser.java.txt http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcLexer .java.txt

Answer 1

采纳答案by Pavel Minaev

You can use Regex.Splitwith zero-width assertions. For example, the following will split on +-*/:

您可以使用Regex.Split零宽度断言。例如，以下内容将拆分为+-*/：

Regex.Split(str, @"(?=[-+*/])|(?<=[-+*/])");

Effectively this says, "split at this point if it is followed by, or preceded by, any of -+*/. The matched string itself will be zero-length, so you won't lose any part of the input string.

这实际上是说，“如果后面跟着或前面有任何-+*/.

Answer 2

回答by Sam Harwell

If you want a very flexible, powerful, reliable, and expandable solution, you can use the C# port of ANTLR. There is some initial overhead (link is setup information for VS2008)that would likely result in overkill for such a tiny project. Here's a calculator example with support for variables.

如果您想要一个非常灵活、强大、可靠且可扩展的解决方案，您可以使用ANTLR的C# 端口。有一些初始开销（链接是 VS2008 的设置信息）可能会导致这样一个小项目的过度杀伤。这是一个支持变量的计算器示例。

Probably overkill for your class, but if you're interested in learning about "real" solutions to this type of real-world problem, have a look-see. I even have a Visual Studio package for working with the grammars, or you can use ANTLRWorksseparately.

对于您的课程来说可能有点矫枉过正，但是如果您有兴趣了解此类现实世界问题的“真实”解决方案，请看一看。我什至有一个用于处理语法的Visual Studio 包，或者您可以单独使用ANTLRWorks。

Answer 3

回答by Shane Castle

This produces your output:

这会产生您的输出：

string s = "24+3";
string seps = @"(\t)|(\n)|(\+)|(-)|(\*)|(/)|(\()|(\))";
string[] tokens = System.Text.RegularExpressions.Regex.Split(s, seps);

foreach (string token in tokens)
    Console.WriteLine(token);

C# Tokenizer - 保留分隔符

提问by Ipster

采纳答案by Pavel Minaev

回答by Sam Harwell

回答by Shane Castle

相关推荐

最近更新

标签

C# Tokenizer - 保留分隔符

提问by Ipster

采纳答案by Pavel Minaev

回答by Sam Harwell

回答by Shane Castle

相关推荐

C# 如何调试引用的 dll（有 pdb）

C# ASP.Net : DataPager Control 总是落后于分页

C# .NET OCR 图像

如何在 C# 中使用 SMO 列出 SQL Server 的可用实例？

相关推荐

最近更新

标签