C# Tokenizer - 保留分隔符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1134311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# Tokenizer - keeping the separators
提问by Ipster
I am working on porting code from JAVA to C#, and part of the JAVA code uses tokenizer - but it is my understanding that the resulting array from the stringtokenizer in Java will also have the separators (in this case +, -, /, *, (, )) as tokens. I have attempted to use the C# Split() function, but it seems to eliminate the separators themselves. In the end, this will parse a string and run it as a calculation. I have done a lot of research, and have not found any references on the topic.
我正在将代码从 JAVA 移植到 C#,并且部分 JAVA 代码使用了分词器——但我的理解是,Java 中 stringtokenizer 的结果数组也将有分隔符(在这种情况下,+、-、/、* , (, )) 作为标记。我曾尝试使用 C# Split() 函数,但它似乎消除了分隔符本身。最后,这将解析一个字符串并将其作为计算运行。我做了很多研究,但没有找到任何关于该主题的参考。
Does anyone know how to get the actual separators, in the order they were encountered, to be in the split array?
有谁知道如何按照遇到的顺序将实际的分隔符放入拆分数组中?
Code for token-izing:
标记化代码:
public CalcLexer(String s)
{
char[] seps = {'\t','\n','\r','+','-','*','/','(',')'};
tokens = s.Split(seps);
advance();
}
Testing:
测试:
static void Main(string[] args)
{
CalcLexer myCalc = new CalcLexer("24+3");
Console.ReadLine();
}
The "24+3" would result in the following output: "24", "3" I am looking for an output of "24", "+", "3"
“24+3”将导致以下输出:“24”、“3” 我正在寻找“24”、“+”、“3”的输出
In the nature of full disclosure, this project is part of a class assignment, and uses the following complete source code:
在完全公开的性质下,该项目是课堂作业的一部分,并使用以下完整源代码:
http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcParser.java.txthttp://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcLexer.java.txt
http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcParser.java.txt http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcLexer .java.txt
采纳答案by Pavel Minaev
You can use Regex.Split
with zero-width assertions. For example, the following will split on +-*/
:
您可以使用Regex.Split
零宽度断言。例如,以下内容将拆分为+-*/
:
Regex.Split(str, @"(?=[-+*/])|(?<=[-+*/])");
Effectively this says, "split at this point if it is followed by, or preceded by, any of -+*/
. The matched string itself will be zero-length, so you won't lose any part of the input string.
这实际上是说,“如果后面跟着或前面有任何-+*/
.
回答by Sam Harwell
If you want a very flexible, powerful, reliable, and expandable solution, you can use the C# port of ANTLR. There is some initial overhead (link is setup information for VS2008)that would likely result in overkill for such a tiny project. Here's a calculator example with support for variables.
如果您想要一个非常灵活、强大、可靠且可扩展的解决方案,您可以使用ANTLR的C# 端口。有一些初始开销(链接是 VS2008 的设置信息)可能会导致这样一个小项目的过度杀伤。这是一个支持变量的计算器示例。
Probably overkill for your class, but if you're interested in learning about "real" solutions to this type of real-world problem, have a look-see. I even have a Visual Studio package for working with the grammars, or you can use ANTLRWorksseparately.
对于您的课程来说可能有点矫枉过正,但是如果您有兴趣了解此类现实世界问题的“真实”解决方案,请看一看。我什至有一个用于处理语法的Visual Studio 包,或者您可以单独使用ANTLRWorks。
回答by Shane Castle
This produces your output:
这会产生您的输出:
string s = "24+3";
string seps = @"(\t)|(\n)|(\+)|(-)|(\*)|(/)|(\()|(\))";
string[] tokens = System.Text.RegularExpressions.Regex.Split(s, seps);
foreach (string token in tokens)
Console.WriteLine(token);