Antimatroid, The

thoughts on computer science, electronics, mathematics

Archive for the ‘Logic’ Category

Generalized Interpreter Framework in C#

leave a comment »

Introduction

Like most Computer Science graduates, I had taken a Programming Languages course during college in which we built a compiler and studied the consequences of our design decisions. I had also taken a course in symbolic logic- particularly, looking at the sentential and predicate logic. Lately, I had been thinking about these two courses and thought it would be interesting to write an interpreter that would take in a sentence valid under sentential logic and output the resulting truth table.

After looking at what I had built, it became apparent that I could generalize my solution into a flexible framework where I could easily create a new interpreter for a desired (albeit simple) formal grammar and produce the desired outputs that other applications could then use for whatever purpose. The following is an overview of this framework and an example implementation for generating truth tables for sentences valid under sentential logic.

Framework

Specification

An Interpreter takes a string and produces an object hierarchy that a client can perform functions upon. The Interpreter does so by following the rules of a grammar that the string is assumed (although not garaunteed) to be valid under. The Interpreter will only be valid for context-free-grammars (CFG). A CFG consists of Terminals and Nonterminals. A Terminal is a character or sequence of characters that terminate a definition. A Nonterminal is a rule that defines the relationships between other Nonterminals and Terminals.

Design

interpreter_components

Interpreter designs differs from traditional compiler design, in that only the so-called front-end is implemented. There are three principal components involved in this process: Tokenizer, Parser and Semantic Analyzer.

A Tokenizer takes a string of characters and returns a collection of Tokens, where each Token is a tuple containing the accepted substring and meta data describing what the substring represents. Tokens are identified by looking for Terminals in the input string.

The Parser then builds (formally) a concrete syntax tree (informally a syntax tree) following the rules defined by the grammar. This is done by looking for Nonterminals in the collection of Tokens.

Finally, the Semantic Analyzer builds (formally) an abstract syntax tree (informally a semantic tree) by ignoring the syntax in the parse tree and constructing the appropriate tree for performing functions upon.

In the event of an exception, the Interpreter will write the details of the exception to a text writer (e.g., Console.Error) provided by the client of the Interpreter.

Exceptions are to be thrown under the following conditions:

  • The input string is null or empty, or none of the Terminals are able to accept a character in the input string.
  • None of the Nonterminals are able to accept a set of tokens, or the number of tokens is deficient for the Nonterminals to identify a match.

Implementation

interpreter_uml

abstract public class Tokenizer<T> {
    protected TerminalList<T> Terminals { get; private set; }

    public Tokenizer() {
        Terminals = new TerminalList<T>();
        loadTerminals();
    }

    public TokenList<T> Tokenize(string input) {
        if (string.IsNullOrEmpty(input))
            throw new UnexpectedEndOfInputException();

        TokenList<T> tokens = new TokenList<T>();
        Token<T> candiate;

        for (int n = 0; n < input.Length; n++) {
            if (char.IsWhiteSpace(input[n]))
                continue;

            if (!terminalExists(input, n, out candiate))
                throw new UnexpectedInputException(n);

            tokens.Add(candiate);
            n += candiate.Lexeme.Length - 1;
        }

        return tokens;
    }

    private bool terminalExists(string input, int n, out Token<T> candiate) {
        candiate = null;
        foreach (Terminal<T> terminal in Terminals)
            if (terminal.Exists(input, n, out candiate))
                return true;
        return false;
    }

    abstract protected void loadTerminals();
}
abstract public class Parser<T, P> {
    protected NonterminalList<T, P> Nonterminals { get; private set; }
    protected T[] FirstExpectedTokens { get; private set; }

    public Parser() {
        Nonterminals = new NonterminalList<T, P>();
        loadNonterminals();

        FirstExpectedTokens = new T[Nonterminals.Count];
        for (int n = 0; n < Nonterminals.Count; n++)
            FirstExpectedTokens[n] = Nonterminals[n].FirstExpectedToken;
    }

    public P Parse(TokenList<T> tokens) {
        return Parse(tokens, 0, tokens.Count - 1);
    }

    public P Parse(TokenList<T> tokens, int n, int m) {
        P parsedSentence;
        foreach (Nonterminal<T, P> nonterminal in Nonterminals)
            if (nonterminal.Exists(tokens, n, m, out parsedSentence))
                return parsedSentence;

        throw new UnexpectedTokenException<T>(tokens[n], FirstExpectedTokens);
    }

    abstract protected void loadNonterminals();
}
abstract public class SemanticAnalyzer<P, S> {
    abstract public S Analyze(P sentence);
}
abstract public class Interpreter<T, P, S> {
    private TextWriter externalLogger;

    abstract protected Tokenizer<T> Tokenizer { get; }
    abstract protected Parser<T, P> Parser { get; }
    abstract protected SemanticAnalyzer<P, S> SemanticAnalyzer { get; }

    public Interpreter(TextWriter externalLogger) {
        this.externalLogger = externalLogger;
    }

    public bool Interpret(string candidateSentence, out S sentence, out TokenList<T> tokens) {
        sentence = default(S);
        tokens = null;

        try {
            tokens = Tokenizer.Tokenize(candidateSentence);
            sentence = SemanticAnalyzer.Analyze(Parser.Parse(tokens));

            return true;
        } catch (UnexpectedInputException uie) {
            logger(uie.Message);
            logger("'{0}'", candidateSentence);
            logger(" {0}^", string.Empty.PadLeft(uie.AtIndex, '.'));

        } catch (UnexpectedEndOfInputException ueie) {
            logger(ueie.Message);
            logger("'{0}'", candidateSentence);
            logger(" {0}^", string.Empty.PadLeft(candidateSentence.Length, '.'));

        } catch (UnexpectedTokenException<T> pe) {
            logger(pe.Message);
            logger("'{0}'", candidateSentence);
            logger(" {0}^", string.Empty.PadLeft(pe.Unexpected.StartsAtIndex, '.'));
        }

        return false;
    }

    private void logger(string format, params object[] values) {
        externalLogger.WriteLine(format, values);
    }
}

Example

Specification

SL Backus-Naur Form (BNF)
sentence ::= sentence_letter
| ~ sentence
| ( sentence connective sentence )
 
sentence_letter ::= letter
| letter number
 
connective ::= v
| ^
| ->
| <->
 
letter ::= a
|
| z
| A
|
| Z
 
number ::= number digit
| digit
 
digit ::= 0
|
| 9

Sentential Logic (SL) is a simple formal system that consists of two semantic elements: primitive types called Sentence Letters that may either be true or false, and expressions built upon Sentence Letters called Sentences. A Sentence Letter is any character a-z (lower or upper) followed by an optional number. Sentences are defined in terms of other sentences, the simplest is a plain Sentence Letter. Any negated ¬ sentence is also a sentence. Any two sentences connected by a conjunction ∧, disjunction ∨, material conditional → or material biconditional ↔ and enclosed by parentheses () is also a sentence under SL. No other arrangement of symbols constitutes a sentence under SL.

Connectives and negation Truth Tables
  ¬   T F   T F   T F   T F
T F     T F     T T     T F     T F
F T     F F     T F     T T     F T

A truth table has a column for each of the sentence letter that show up in a sentence and a column for the sentence. Each row is one Cartesian coordinate in the space \{T, F \}^{n} where n is the number of the number sentence letters. By convention, one lists the rows in order all sentence letters being true, to the last row where all sentence letters are false.

Implementation

TerminalSyntax and TerminalSentenceLetter are subclasses of Terminal and implement the logic for determining if a certain syntax or sentence letter exists in an input string. TerminalSyntax takes in the TokenType and string to find in the input as configuration options. If one felt inclined, one could have implemented a TerminalRegularExpression class instead of two separate classes but I felt it was unnecessary.

public class SLTokenizer : Tokenizer<TokenType> {
    protected override void loadTerminals() {
        Terminals.AddRange(new Terminal<TokenType>[] { 
            new TerminalSyntax(TokenType.Conjunction, "^"),
            new TerminalSyntax(TokenType.Disjunction, "v"), 
            new TerminalSyntax(TokenType.LeftParen, "("),
            new TerminalSyntax(TokenType.MaterialBiconditional, "<->"),
            new TerminalSyntax(TokenType.MaterialConditional, "->"),
            new TerminalSyntax(TokenType.Negation, "~"),
            new TerminalSyntax(TokenType.RightParen, ")"),
            new TerminalSentenceLetter()
        });
    }
}

The three Nonterminal types for Connectives, Negation and Letters are subclasses of Nonterminal and implement the logic for determining if a collection of tokens matches the rules associated with each Nonterminal type.

public class SLParser : Parser<TokenType, ParsedSentence> {
    protected override void loadNonterminals() {
        Nonterminals.AddRange(new Nonterminal<TokenType, ParsedSentence>[] { 
            new NonterminalSentenceConnective(this),
            new NonterminalSentenceNegation(this),
            new NonterminalSentenceLetter()
        });
    }
}

Nothing particularly exciting here, simply mapping the parsed sentences to actual sentences that will be used by the client code. Sentence implementations have the logic for evaluating themselves.

public class SLSemanticAnalyzer : SemanticAnalyzer<ParsedSentence, Sentence> {
    public override Sentence Analyze(ParsedSentence sentence) {
        ParsedSentenceLetter letter = sentence as ParsedSentenceLetter;
        if (letter != null)
            return new SentenceLetter(letter);

        ParsedSentenceUnary negation = sentence as ParsedSentenceUnary;
        if (negation != null)
            return new SentenceUnaryNegation(negation, Analyze(negation.Sentence));

        ParsedSentenceBinary connective = sentence as ParsedSentenceBinary;
        if (connective != null) {
            switch (connective.Operation.TokenType) {
                case TokenType.Conjunction:
                    return new SentenceBinaryConjunction(connective, Analyze(connective.Left), Analyze(connective.Right));
                case TokenType.Disjunction:
                    return new SentenceBinaryDisjunction(connective, Analyze(connective.Left), Analyze(connective.Right));
                case TokenType.MaterialBiconditional:
                    return new SentenceBinaryMaterialBiconditional(connective, Analyze(connective.Left), Analyze(connective.Right));
                case TokenType.MaterialConditional:
                    return new SentenceBinaryMaterialConditional(connective, Analyze(connective.Left), Analyze(connective.Right));
            }
        }

        throw new NotImplementedException();
    }
}

Nothing extra to implement in the SLInterpreter to add other than the constructor and properties.

public class SLInterpreter : Interpreter<TokenType, ParsedSentence, Sentence> {
    public SLInterpreter(TextWriter externalLogger) : base(externalLogger) { 
    
    }
    
    override protected Tokenizer<TokenType> Tokenizer {
        get { return new SLTokenizer(); }
    }

    override protected Parser<TokenType, ParsedSentence> Parser {
        get { return new SLParser(); }
    }

    override protected SemanticAnalyzer<ParsedSentence, Sentence> SemanticAnalyzer {
        get { return new SLSemanticAnalyzer(); }
    }
}

Once all of the Interpreter code has been completed, it becomes trivial to go and implement the truth table.

private void table(string input) {
    SLInterpreter interpreter = new SLInterpreter(Console.Out);
    TokenList<TokenType> tokens;
    Sentence sentence;

    if (!interpreter.Interpret(input, out sentence, out tokens))
        return;

    List<Variable> variables = tokens
       .Where((x) => x.TokenType == TokenType.Letter)
       .Select((x) => x.Lexeme)
       .Distinct()
       .Select((x) => new Variable(x))
       .ToList();

    foreach (Variable variable in variables)
        Console.Write("{0} ", variable.Identifier);
    Console.WriteLine("| {0}", input);

    for (int n = (1 << variables.Count) - 1; n >= 0; n--) {
        int N = n;
        for (int m = variables.Count - 1; m >= 0; m--) {
            variables[m].Value = (N & 1) == 1;
            N >>= 1;
        }

        IDictionary<Token<TokenType>, bool> tableRow = new Dictionary<Token<TokenType>, bool>();
        sentence.Evaluate(variables, tableRow);

        int offset = 2;
        foreach (Variable variable in variables) {
            Console.Write("{0} ", variable.Value ? "T" : "F");
            offset += 1 + variable.Identifier.Length;
        }

        Console.Write("| ");
        foreach (Token<TokenType> token in tableRow.Keys) {
            Console.CursorLeft = token.StartsAtIndex + offset;
            Console.Write(tableRow[token] ? "T" : "F");
        } Console.WriteLine();
    }
}
Advertisements

Written by lewellen

2009-03-01 at 12:00 am