Tiny Rust Compile Tricks

Small Rust compile-time tricks that I’ve found useful in my projects

Wayne Lau

  ·  2 min read

Tiny rust tricks I’ve learnt #

Description #

These implementations may or may not result in performance gains but I found them quite useful

include_str! #

I used this in llmperf when I realised that the sonnet file never changes. If you are always using read_to_string on the same file everytime, you can consider baking your str directly. Note there is a size limit.


// instead of 
let input_str = fs::read_to_string("sonnet.txt")?;

// use
let input_str = include_str!("sonnet.txt");

Note: There are definitely different use cases, whether you want the text to be available at compile time. My str never changes and it can be baked with the binary.

build.rs #

The use case for this is usually for linking C here. But I found it interesting to build arrays or structs.

This is the build.rs from my project mock-openai

// build.rs
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};
use std::path::Path;

fn main() {
    // 1. Tell Cargo to rerun this script if the sonnets or tokenizer change
    println!("cargo:rerun-if-changed=build/sonnets.txt");
    println!("cargo:rerun-if-changed=build/tokenizer.json");

    let raw_string = include_str!("build/sonnets.txt");

    let tokenizer = tokenizers::Tokenizer::from_file("build/tokenizer.json").unwrap();
    let tokens = tokenizer.encode(raw_string, false).unwrap();
    let decoded_tokens: Vec<String> = tokens
        .get_tokens()
        .into_iter()
        .map(|s| serde_json::to_string(s).expect("Failed to escape token string"))
        .collect();

    let out_dir = env::var("OUT_DIR").unwrap();
    let dest_path = Path::new(&out_dir).join("generated.rs");
    let mut f = BufWriter::new(File::create(&dest_path).unwrap());

    writeln!(f, "pub static TOKENIZED_OUTPUT: &[&str] = &[").unwrap();
    for token in &decoded_tokens {
        // We use {:?} to handle escaping quotes/newlines in the strings
        writeln!(f, "    {:?},", token).unwrap();
    }
    writeln!(f, "];").unwrap();
    writeln!(f, "pub static MAX_OUTPUT: &str = {:?};", raw_string).unwrap();
    writeln!(
        f,
        "pub static MAX_TOKENS: usize = {};",
        decoded_tokens.len()
    )
    .unwrap();
}

The above was an over optimization for testing, but I noticed that tokenizers was only needed for creating the TOKENIZED_OUTPUT once, and it’s a const variable. So I was wondering if I could build a static array of &str directly.