Building a Symbolic Computation Engine

Since pivoting to mathematics from computer science, I’ve become increasingly fascinated by Computational Theory. From my first encounter with Lambda Calculus, to the Church-Turing thesis, I’ve been captivated by the idea that computation can be understood in such a formal, mathematical way.

Whether you’re a proponent of the Simulation Theory, or the Computational Hypothesis of Consciousness, or even write them off as being insane— the idea that our universe operates like a giant computational system is, I think, undeniably interesting. How fascinating is it that, not only have we discovered universal models of computation, but that we have discovered multiple! The famous Church-Turing thesis posits that any function that can be computed by the Lambda Calculus can also be computed by a Turing Machine, and vice versa. This equivalence is a cornerstone of theoretical computer science, demonstrating that different models of computation can achieve the same results, despite their differing approaches.

Sitting in an EECS70 (Discrete Mathematics and Probability Theory) lecture at UCB, I pondered what sort of thing I could build that would help me explore this interest, and help me gain a deeper understanding: What does it mean to compute? How do systems like Lean and Rocq turn programs into verifiable proofs?

And so I decided I would start small, and work my way up to a minature version of Lean. But let’s not get too ahead of ourselves yet. How we begin this journey is important. Let’s discuss the first building block of any interactive theorem prover: symbolic computation.

What is Symbolic Computation?

You’re probably — at least I hope — familiar with numerical computation: the process of performing arithmetic on numbers. Symbolic computation, on the other hand, involves manipulating mathematical expressions in a symbolic form. If you have studied Calculus, for example, think of this like differentiation. When you differentiate an expression, like

f (x) = x^{2} + 3 x + 2

you are not simply evaluating the function at a specific point, but rather rewriting the symbolic expression by applying a set of rules.

Rewrite Rules

The relevant rules to this computation are the constant rule, scalar rule, and power rule:

Constant Rule \frac{d}{d x} [c] = 0, Scalar Rule \frac{d}{d x} [c x] = c, Power Rule \frac{d}{d x} [x^{n}] = n x^{n - 1}

For the sake of our system, we call these rewrite rules, since they tell us how to rewrite an expression into a “simpler” form. Performing symbolic differentiation on f(x) then is simply applying the rewrite rules, which tell us that the derivative is equal to

f^{'} (x) = \frac{d}{d x} [x^{2} + 3 x + 2] = 2 x + 3

On pen-and-paper, this seems simple enough. But how do we get a computer to do this? How would we even represent algebraic expressions like $x^{2} + 3 x + 2$ in the first place? Well, what if I told you that there’s a programming language nearly 70 years old that was designed specifically for this? Welcome to the strange world of Lisp.

Lisp and S-expressions

Lisp, which stands for “LISt Processing,” is one of the oldest high-level languages still in use today (second only to Fortran). It was created in 1958 by John McCarthy, a pioneer in the field of artificial intelligence. One of the most distinctive features of Lisp is its use of S-expressions (Symbolic Expressions) to represent both code and data. This uniform representation allows for powerful metaprogramming capabilities, where programs can manipulate other programs as data.

S-expressions are essentially nested lists, where the first element of the list is typically an operator or function, and the subsequent elements are the operands or arguments. Let’s see how they work in practice with a simple example. Consider the mathematical expression: 2 * 2 + 6. In Lisp, this is represented by the following S-expression:

(+ 6 (* 2 2))

S-expressions are evaluated from the innermost parentheses outward. In this case, the multiplication (* 2 2) is evaluated first, yielding 4, and then the addition (+ 6 4) is performed, resulting in 10. But what makes Lisp especially powerful for symbolic computation is that symbol manipulation is a first-class feature of the language, allowing us to manipulate variables without the need to bind them to a specific value. For example, we can represent the expression x * x + 6 as:

(+ 6 (* 'x 'x))

without having to assign a specific value to x. This allows us to perform algebraic manipulations, simplifications, and even differentiation symbolically.

Hopefully this is enough to convince you that S-expressions are a great fit for our symbolic computation engine. I’ll probably add a section here later about Lambda Calculus and how S-expressions are directly related to it, but for now, let’s move on to building our engine.

Computation Tree

A key insight I had, which I believe is shared by many symbolic computation engines, is that we can represent algebraic expressions as trees. When thinking about how S-expressions are evaluated innermost-first, it becomes clear that trees are a natural fit for representing the structure of these expressions. Consider the evaluation of: (+ (* 2 x) 3). This expression can be visualized as a tree, where its evaluation proceeds by flattening the tree to its normal form:

       +                +              
      / \              / \
     *   3    -->     2x  3    -->    2x + 3
    / \
   2   x

Normal Form An expression is said to be in normal form if it cannot be simplified or reduced any further. In the context of symbolic computation, reaching normal form means that all possible simplifications have been applied to the expression, and it is presented in its simplest or most canonical representation.

This tree structure also supports unary operations, such as trigonemetric functions and logarithms, by embedding an implicit nil node as the second child of the unary operator. For example, the expression sin(x) + 3 can be represented as the computation tree:

       +                 +      
      / \               / \
   sin   3    -->     sin  3   -->   sin(x) + 3
    / \               /
   x  nil            x

With this insight, we can start writing code to represent our core data type: The computation tree.

Defining the Tree Structure

We want to define a tree structure that can represent both operators and operands. We’ll be building our engine in Common Lisp, which supports object-oriented programming through its Common Lisp Object System (CLOS).

Classes

We’ll define a base class node, and then create subclasses for specific types of nodes:

;; /src/node.lisp
 
;; -------------------------------------------------------------
;; Node Classes
;; -------------------------------------------------------------
 
;; Base class for all nodes
(defclass node ()
  ((left  :initarg :left  :accessor node-left  :initform nil)
   (right :initarg :right :accessor node-right :initform nil)
 
;; Atomic node
(defclass atom-node (node)
  ((value :initarg :value :accessor node-value)))
 
;; Symbol node
(defclass sym-node (node)
  ((symbol :initarg :symbol :accessor node-symbol)))
 
;; Operator node
(defclass op-node (node)
  ((operator :initarg :operator :accessor node-operator)))

Constructors

We also need constructors to create instances of these classes:

;; /src/node.lisp
 
;; ------------------------------------------------------------- 
;; Constructors
;; -------------------------------------------------------------
 
(defun new-atom (value)
  (unless (numberp value)
    (error "Invalid atom type: ~A" value))
  (make-instance 'atom-node :value value :left nil :right nil))
 
(defun new-op (op left right)
  "Create an operator node."
  (make-instance 'op-node :value op :left left :right right))
 
(defun new-sym (symbol)
  "Create a symbol node (variable)."
  (make-instance 'sym-node :value symbol :left nil :right nil))

Type Predicates

It will also be useful to have type predicates to check the type of a node (atomp is the atom predicate, opp is the operator predicate, they simply return true if the node is of that type, false otherwise):

;; /src/node.lisp
 
;; ------------------------------------------------------------- 
;; Type Predicates
;; -------------------------------------------------------------
 
(defun atomp (node)
  (typep node 'atom-node))
 
(defun opp (node)
  (typep node 'op-node))

Parser

Before we can actually do anything with these computation trees, it would be nice to have a way of converting lisp-native S-expressions directly into these computation trees. This will make it much easier to actually use and test our engine, since we can just write S-expressions directly in our REPL. We’ll create /src/wire.lisp, which will contain funcations for “wiring” up computation trees from S-expressions.

;; /src/wire.lisp
 
;; ------- Wires S-expressions to computation trees
;; --> Outputs the root of the computation tree
;; -------------------------------------------------
(defun wire! (expr) 
  "Convert an S-expression into a computation tree."
  (cond
    ;; Base case: number
    ((numberp expr) 
     (new-atom expr))
    
    ;; Base case: symbol (variable)
    ((symbolp expr) 
     (new-sym expr))
    
    ;; Recursive case: list (operation)
    ((listp expr)
     (let ((op (first expr)))
       (cond
         ;; Binary operators
         ((member op '(+ - * / ^))
          (new-op op 
                  (wire! (second expr))   ; recursively wire left
                  (wire! (third expr))))  ; recursively wire right
         
         ;; Unary operators (sin, cos, etc.)
         ((member op '(sin cos tan ln exp sqrt))
          (new-op op 
                  (wire! (second expr))   ; wire the operand
                  nil))                  ; no right child
         
         (t (error "Unknown operator: ~A" op)))))
    
    (t (error "Cannot parse expression: ~A" expr))))

To be continued... This post is a work in progress. I will be adding more sections soon.

emo.computer

Explorer