Skip to main content

Command Palette

Search for a command to run...

Building a Lisp-like Language from Scratch in Rust

Updated
Building a Lisp-like Language from Scratch in Rust

This post delves into building an interpreter for a Lisp-like language using Rust. No knowledge beyond Rust basics is required to follow this post.

Inspiration and Project Overview

Inspired by Stepan Parunashvili's article Risp (in (Rust) (Lisp)), I created lip, an interpreted language designed for logical operations with a Lisp-like syntax. This supports logical operations (not, and, or), branching (if expression), lambda functions, and variable definition.

This post guides you through the process of building an interpreter, focusing on the core functionalities of tokenizing, parsing, and evaluating expressions. While it may sound complex, the process only requires basic Rust knowledge. The language we'll build is lp, a distilled version of lip designed for illustration. The live demo of an lp interpreter is accessible via a browser, and the complete code is available on GitHub.

Here are some examples of lp code:

Literal (T = true, F = false)

T
=> true

Logical operations (not, and, or)

(^ (& T (| F F T)))
=> !(true & (false | false | true))
=> false

If expression

(if (& T T F)
  (^ F)
  (| F F F))
=> if (true & true & false) { !false } else { false | false | false }
=> false

The logical operations may look weird especially if not familiar with Lisp. Lisp uses prefix notation, where the operator comes first, followed by as many operands as you want.

The original Lisp has a number type. (+ 1 2 3 4) means 1 + 2 + 3 + 4, and interestingly, (+) is 0. The same thing applies to lp: (& T T F F) means (true & true & false & false), and (&) is true.

Implementation Steps

To understand how lp code is evaluated, let's break down the process into three steps: tokenize, parse, and evaluate. We'll use the expression (& T (| F T)) as an example.

Imagine the lp code as a sentence. Tokenization is like splitting the sentence into individual words and punctuation marks. In our example, (& T (| F T)) is tokenized into an array like [left paren, and, true, left paren, or, ...].

Once we have the tokens, we need to understand their structure and meaning. Parsing involves arranging the tokens in a way that reflects their relationships. This is similar to how we understand the grammar of a sentence.

For our example, parsing creates an abstract syntax tree (AST), which is a tree-like representation of the expression. The AST for (& T (| F T)) looks like this:

   and
  /   \
true  or
     /  \
 false  true

The last task is evaluation. Here, the AST is used to compute the final value of the expression. We traverse the AST starting from the leaves to the root:

   and
  /   \
true  or
     /  \
 false  true

    ▼

   and
  /   \
true true

    ▼

  true

Now that we have grasped the overview of the implementation steps, let's get started with tokenize.

Tokenize

Let's commence by defining valid tokens:

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub enum Operator {
    And, // &
    Or, // |
    Not, // ^
}

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub enum Token {
    Lparen, // (
    Rparen, // )
    Bool(bool), // T or F
    Operator(Operator),
}

Write a unit test to clarify the goal:

#[derive(Debug)]
pub enum LpErr {
    Tokenize(String),
    Parse(String),
    Eval(String),
}

#[test]
fn tokenize_example_works() -> Result<(), LpErr> {
    let tokens = tokenize("(& T (| F T))")?;
    assert_eq!(
        vec![
            Token::Lparen,
            Token::Operator(Operator::And),
            Token::Bool(true),
            Token::Lparen,
            Token::Operator(Operator::Or),
            Token::Bool(false),
            Token::Bool(true),
            Token::Rparen,
            Token::Rparen,
        ],
        tokens
    );
    Ok(())
}

Tokenizing lp is simple because it is almost tokenized from the start! Each token is separated by whitespace except for ( and ). To achieve tokenization, add whitespace to the parenthesis and split the expression:

pub fn tokenize(expr: &str) -> Result<Vec<Token>, LpErr> {
    expr.replace('(', "( ")
        .replace(')', " )")
        .split_ascii_whitespace()
        .map(|s| match s {
            "(" => Ok(Token::Lparen),
            ")" => Ok(Token::Rparen),
            "T" => Ok(Token::Bool(true)),
            "F" => Ok(Token::Bool(false)),
            "&" => Ok(Token::Operator(Operator::And)),
            "|" => Ok(Token::Operator(Operator::Or)),
            "^" => Ok(Token::Operator(Operator::Not)),
            _ => Err(LpErr::Tokenize(format!("invalid token `{s}`"))),
        })
        .collect::<Result<Vec<Token>, LpErr>>()
}

That's it. By the way, the implementation above shows an effective use of collect(). It transforms the iterator of Result<T, U> into Result<Vec<T>, U>, constructing Vec<U> by accumulating Ok<T>. If the iterator contains Err<U>, the result will be Err<U>.

Parse

Before diving into parsing expressions, let's establish what constitutes an expression in lp. Expressions in lp can be primitives (T and F) or logical operations:

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expr {
    Bool(bool),
    Call(Operator, Vec<Expr>),
}

Again, our expectations are clarified by the following test:

#[test]
fn parse_example_works() -> Result<(), LpErr> {
    let tokens = tokenizer::tokenize("(& T (| F T))")?;
    let expr = parse(&tokens)?;
    assert_eq!(and(&[T, or(&[F, T])]), expr);
    Ok(())
}

const T: Expr = Expr::Bool(true);
const F: Expr = Expr::Bool(false);

fn and(operands: &[Expr]) -> Expr {
    Expr::Call(Operator::And, operands.to_vec())
}

fn or(operands: &[Expr]) -> Expr {
    Expr::Call(Operator::Or, operands.to_vec())
}

The helper functions and constants (and, or, T, F) simplify the test code. For instance, and(&[T, or(&[F, T])]) looks concise compared to Expr::Call(Operator::And, vec![T, Expr::Call(Operator::Or, vec![F, T])]).

In implementing parse, the crucial insight is that lp expression has a recursive structure, where Expr::Call contains Expr. This naturally leads us to consider a recursive function for parsing:

// Returns parsed Expr and the number of consumed tokens if succeeded.
fn parse(tokens: &[Token]) -> Result<(Expr, usize), LpErr> {
    if tokens[0] != Token::Lparen {
        return match tokens[0] {
            Token::Bool(b) => Ok((Expr::Bool(b), 1)),
            _ => Err(LpErr::Parse(format!("invalid expression: `{:?}`", tokens[0]))),
        };
    }

    let operator = match tokens[1] {
        Token::Operator(o) => o,
        _ => return Err(LpErr::Parse(format!("invalid operator: `{:?}`", tokens[1]))),
    };

    // parsing operands
    let mut p = 2;
    let mut operands = vec![];
    while tokens[p] != Token::Rparen {
        let (expr, consumed) = parse(&tokens[p..])?;
        operands.push(expr);
        p += consumed;
    }
    Ok((Expr::Call(operator, operands), p + 1))
}

The initial parts of the function handle non-special cases. If the expression doesn't start with (, it must be T or F. If the first token is (, the subsequent token is expected to be an operator.

The most intriguing aspect lies in parsing operands. It parses expressions while maintaining the current position.

The first call to parse starts from the head, encountering ( and & tokens:

▼
(& T (| F T))

The second parse is called recursively and starts from tokens[2]. processing a single token T and returns:

   ▼
(& T (| F T))

The third recursive call commences from tokens[3]. This time, it marks the start of an or expression, initiating the operand-parsing process.

     ▼
(& T (| F T))

After parsing or, the position p points to the last token, which should be ):

            ▼
(& T (| F T))

Finally, we encapsulate the function and expose only the necessary result:

pub fn parse(tokens: &[Token]) -> Result<Expr, LpErr> {
    let (expr, _) = parse_internal(tokens)?;
    Ok(expr)
}

fn parse_internal(tokens: &[Token]) -> Result<Expr, LpErr> {
  // ...
}

Evaluate

After conquering the challenging part of parse, the remainder is surprisingly straightforward! Before delving into the implementation, let's get started with a test:

#[test]
fn eval_example_works() -> Result<(), LpErr> {
    // (true & (false | true)) => true
    let tokens = tokenizer::tokenize("(& T (| F T))")?;
    let expr = parser::parse(&tokens)?;
    let value = eval(&expr)?;
    assert_eq!(Value::Bool(true), value);
    Ok(())
}

Given the recursive structure of AST, it's only natural that the evaluation function eval mirrors this recursive design:

#[derive(Debug, PartialEq, Eq)]
pub enum Value {
    Bool(bool),
}

pub fn eval(expr: &Expr) -> Result<Value, LpErr> {
    match expr {
        Expr::Bool(b) => Ok(Value::Bool(*b)), // Base case of recursion
        Expr::Call(operator, operands) => {
            let operands: Vec<bool> = operands
                .iter()
                .map(|o| match eval(o) { // Eval operands
                    Ok(Value::Bool(b)) => Ok(b),
                    _ => Err(LpErr::Eval(format!("invalid operand: {o:?}"))),
                })
                .collect::<Result<Vec<bool>, LpErr>>()?;

            let value = match operator { // Compute the result
                Operator::And => operands.into_iter().all(|o| o),
                Operator::Or => operands.into_iter().any(|o| o),
                Operator::Not => {
                    let len = operands.len();
                    if len != 1 {
                        return Err(LpErr::Eval(format!(
                            "not must have 1 operand, not {len}"
                        )));
                    }
                    !operands[0]
                }
            };
            Ok(Value::Bool(value))
        }
    }
}

Now, the lp code can be evaluated. To wrap things up, let's create a REPL, an interactive environment.

REPL (Read-Evalueate-Print Loop)

Many interpreted languages boast a REPL. For instance, running python launches its REPL, Ruby has irb, and even Rust offers one like evcxr. Why not have one for lp?

use std::io::{self, Write};

fn main() -> io::Result<()> {
    loop {
        print!("lp> ");
        io::stdout().flush()?;

        let mut input = String::new();
        io::stdin().read_line(&mut input)?;
        if input.trim() == ":exit" {
            break;
        }

        let result = tokenize(&input)
            .and_then(|tokens| parse(&tokens))
            .and_then(|expr| eval(&expr));

        match result {
            Ok(value) => {
                println!("=> {value:?}");
                io::stdout().flush()?;
            }
            Err(e) => eprintln!("error: {e:?}"),
        }
    }
    Ok(())
}

Let's engage with it:

lp> (& T (| F T))
=> Bool(true)
lp> (|)
=> Bool(false)
lp> :exit

Implementing std::fmt::Display for Value is a good way to provide a more user-friendly output. Feel free to do so.

If Expression

While lp can deal with complex expressions, it still lacks many features found in ordinary languages, such as if, for, or function definition.

Fortunately, incorporating some additional features is not overly complicated. Take if as an example. The expected result is like the following:

#[test]
fn eval_if_works() -> Result<(), LpErr> {
    // if false { true } else { false | false } => false
    let tokens = tokenizer::tokenize("(if F T (| F F))")?;
    let expr = parser::parse(&tokens)?;
    let value = eval(&expr)?;
    assert_eq!(Value::Bool(false), value);
    Ok(())
}

Let's walk through the process from tokenize to eval. First, a new keyword must be added to Token:

pub enum Token {
    Lparen,
    Rparen,
    Bool(bool),
    Operator(Operator),
    If, // <- new
}

pub fn tokenize(expr: &str) -> Result<Vec<Token>, LpErr> {
    expr.replace('(', "( ")
        .replace(')', " )")
        .split_ascii_whitespace()
        .map(|s| match s {
            "(" => Ok(Token::Lparen),
            // ...
            "if" => Ok(Token::If), // <-- new
    // ...
}

parse also needs modification:

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expr {
    Bool(bool),
    Call(Operator, Vec<Expr>),
    If(Box<Expr>, Box<Expr>, Box<Expr>), // condition, then, else
}

fn parse_internal(tokens: &[Token]) -> Result<(Expr, usize), LpErr> {
    if tokens[0] != Token::Lparen {
      // ...
    }
    if tokens[1] == Token::If {
        return parse_if(tokens);
    }
    // ...
}

fn parse_if(tokens: &[Token]) -> Result<(Expr, usize), LpErr> {
    let mut p = 2;
    let (cond, consumed) = parse_internal(&tokens[p..])?;
    p += consumed;
    let (then, consumed) = parse_internal(&tokens[p..])?;
    p += consumed;
    let (r#else, consumed) = parse_internal(&tokens[p..])?;
    p += consumed;
    Ok((
        Expr::If(Box::new(cond), Box::new(then), Box::new(r#else)),
        p + 1,
    ))
}

The implementation of parse_if is straightforward as it leverages existing code. It calls parse_internal three times to parse the condition, then-clause, and else-clause.

The last part, eval, is also uncomplicated:

pub fn eval(expr: &Expr) -> Result<Value, LpErr> {
    match expr {
        Expr::Bool(b) => { /* ... */ },
        Expr::Call(operator, operands) => { /* ... */ }
        Expr::If(cond, then, r#else) => {
            let cond = match eval(cond)? {
                Value::Bool(cond) => cond,
            };
            eval(if cond { then } else { r#else })
        }
    }

It's worth mentioning that it uses raw identifier with r#else to use else as a variable.

If is implemented relatively easily. However, adding functions or variable definitions requires more work. Feel free to explore these aspects on your own, and check out and use the lip repo as a reference.

Conclusion

In this exploration of interpreter implementation, we demystified the process by leveraging Rust's powerful features and embracing Lisp's straightforward syntax.

The journey through building an interpreter extends beyond the typical "hello world" or to-do app projects. It serves as a valuable exercise, showcasing the elegance and flexibility that Rust offers.

More from this blog

キャリア 6 年目の振り返り

2020 年 4 月に働き始めてから 6 年近く経過した。今年を振り返ってまとめておく。 正社員化 2025 年前半あたりは、生活に安定感がなくジャグリングをしているような感覚だった。学校の卒業、ビザの切り替え、インターンから正社員への移行という3つのイベントを同時に進めていた。結果的に 5 月から正社員として働き始めたが、計画通りには行かなかった。 会社の人事プロセスが大幅に遅延した。本来は 4 月から働き始める予定だったのに、一ヶ月以上遅れて 5 月になってしまった。これに関しては、本当にこ...

Dec 29, 2025
キャリア 6 年目の振り返り

キャリア 5 年目の振り返り

2020 年 4 月に働き始めてから 5 年が経った。本当は毎年 12 月に一年の振り返りをしようと思っていたのだが、今回はいろいろあって 4 月になってしまったので、2024 年 1 月から 2025 年の今までを対象として振り返りをして、今年の目標を書こうと思う。 海外就職 カナダに渡航して約 1 年半経った。最初の一年間は private college という日本の専門学校のようなところに通っていて、それからジョブオファーを得て働き始めた。 海外で働いてキャリアを積むというのがカナダに来...

Apr 14, 2025
キャリア 5 年目の振り返り

ReRe: Recursive Redefinition

38 posts