Building an expression evaluator from scratch is a rite of passage for software engineers. It demystifies how compilers, interpreters, and database engines parse math formulas. While using ready-made evaluation libraries is common, coding your own forces you to understand tokenization, precedence, and tree structures.
Here is a step-by-step guide to building a robust, simple expression evaluator using the classic Abstract Syntax Tree (AST) approach. The Three Stages of Evaluation
To evaluate a raw string like “3 + 5(2 - 8)”, your program must process the text through three distinct phases:
Lexical Analysis (Tokenization): Converting a raw string of characters into a list of meaningful units called tokens (numbers, operators, parentheses).
Parsing (Syntax Analysis): Turning that flat list of tokens into a hierarchical tree structure that respects math rules like operator precedence.
Evaluation: Traversing the tree to compute the final numerical result. Step 1: Lexical Analysis (The Tokenizer)
Computers do not inherently understand that 1 and 0 placed next to each other mean the number 10. The tokenizer loops through the input string, ignores whitespace, and groups characters into typed objects.
class Token: def init(self, type, value=None): self.type = type # ‘NUMBER’, ‘PLUS’, ‘MINUS’, ‘MUL’, ‘DIV’, ‘LPAREN’, ‘RPAREN’ self.value = value # Holds the numeric value if type is ‘NUMBER’ def tokenize(text): tokens = [] i = 0 while i < len(text): char = text[i] if char.isspace(): i += 1 continue if char.isdigit() or char == ‘.’: num_str = “” while i < len(text) and (text[i].isdigit() or text[i] == ‘.’): num_str += text[i] i += 1 tokens.append(Token(‘NUMBER’, float(num_str))) continue elif char == ‘+’: tokens.append(Token(‘PLUS’)) elif char == ‘-’: tokens.append(Token(‘MINUS’)) elif char == ‘*’: tokens.append(Token(‘MUL’)) elif char == ‘/’: tokens.append(Token(‘DIV’)) elif char == ‘(’: tokens.append(Token(‘LPAREN’)) elif char == ‘)’: tokens.append(Token(‘RPAREN’)) else: raise ValueError(f”Invalid character: {char}“) i += 1 tokens.append(Token(‘EOF’)) # End of file token return tokens Use code with caution. Step 2: Parsing (Building the AST)
Once you have a list of tokens, you need to structure them. If we evaluate blindly from left to right, 3 + 5 * 2 becomes 16 instead of the correct answer, 13.
To handle operator precedence gracefully, we use an Abstract Syntax Tree (AST) and a technique called Recursive Descent Parsing. We define our grammar in order of reverse precedence: Expression: Handles addition and subtraction. Term: Handles multiplication and division.
Factor: Handles base numbers and grouped expressions inside parentheses. First, let’s define our tree nodes:
class NumberNode: def init(self, token): self.value = token.value class BinOpNode: def init(self, left, op_token, right): self.left = left self.op = op_token.type self.right = right Use code with caution. Next, we implement the Parser class to build the tree:
class Parser: def init(self, tokens): self.tokens = tokens self.pos = 0 def current_token(self): return self.tokens[self.pos] def consume(self, token_type): if self.current_token().type == token_type: self.pos += 1 else: raise SyntaxError(f”Expected {token_type}, got {self.current_token().type}“) def factor(self): token = self.current_token() if token.type == ‘NUMBER’: self.consume(‘NUMBER’) return NumberNode(token) elif token.type == ‘LPAREN’: self.consume(‘LPAREN’) node = self.expr() self.consume(‘RPAREN’) return node raise SyntaxError(“Unexpected token in factor”) def term(self): node = self.factor() while self.current_token().type in (‘MUL’, ‘DIV’): op_token = self.current_token() self.consume(op_token.type) node = BinOpNode(node, op_token, self.factor()) return node def expr(self): node = self.term() while self.current_token().type in (‘PLUS’, ‘MINUS’): op_token = self.current_token() self.consume(op_token.type) node = BinOpNode(node, op_token, self.term()) return node Use code with caution. Step 3: Evaluation
The hard work is done. The parser guarantees that multiplication nodes sit lower in the tree than addition nodes, meaning they must be calculated first. Evaluating the tree is a straightforward recursive post-order traversal.
def evaluate(node): if isinstance(node, NumberNode): return node.value if isinstance(node, BinOpNode): left_val = evaluate(node.left) right_val = evaluate(node.right) if node.op == ‘PLUS’: return left_val + right_val if node.op == ‘MINUS’: return left_val - right_val if node.op == ‘MUL’: return left_val * right_val if node.op == ‘DIV’: if right_val == 0: raise ZeroDivisionError(“Division by zero”) return left_val / right_val Use code with caution. Putting It All Together
We can stitch all three components into a single function to test our brand-new engine.
def calculate(expression_string): tokens = tokenize(expression_string) parser = Parser(tokens) ast = parser.expr() return evaluate(ast) # Testing the evaluator print(calculate(“3 + 5 * (2 - 8)”)) # Output: -27.0 print(calculate(“10 / 2 + 3”)) # Output: 8.0 Use code with caution. Next Steps and Enhancements
Congratulations! You have written a fully functional, priority-aware mathematical interpreter. While this codebase is relatively compact, it lays the exact foundations used by major programming languages.
If you want to keep expanding this project, try adding support for: Unary Operators: Handling negative numbers like -5 + 3.
Exponents: Adding the ^ operator, keeping in mind that exponentiation is right-associative (22 is 2^9, not 8^2).
Variables: Passing a dictionary of variables (e.g., {‘x’: 10}) into your evaluator to resolve expressions like x + 5. To help refine this, let me know:
Should we modify the code to support variables and a symbol table?
Leave a Reply