Tiny Rust Compile Tricks
Small Rust compile-time tricks that I’ve found useful in my projects
· 2 min read
Tiny rust tricks I’ve learnt #
Description #
These implementations may or may not result in performance gains but I found them quite useful
include_str!
#
I used this in llmperf when I realised that the sonnet file never changes. If you are always using read_to_string on the same file everytime, you can consider baking your str directly. Note there is a size limit.
// instead of
let input_str = fs::read_to_string("sonnet.txt")?;
// use
let input_str = include_str!("sonnet.txt");Note: There are definitely different use cases, whether you want the text to be available at compile time. My str never changes and it can be baked with the binary.
build.rs
#
The use case for this is usually for linking C here. But I found it interesting to build arrays or structs.
This is the build.rs from my project mock-openai
// build.rs
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};
use std::path::Path;
fn main() {
// 1. Tell Cargo to rerun this script if the sonnets or tokenizer change
println!("cargo:rerun-if-changed=build/sonnets.txt");
println!("cargo:rerun-if-changed=build/tokenizer.json");
let raw_string = include_str!("build/sonnets.txt");
let tokenizer = tokenizers::Tokenizer::from_file("build/tokenizer.json").unwrap();
let tokens = tokenizer.encode(raw_string, false).unwrap();
let decoded_tokens: Vec<String> = tokens
.get_tokens()
.into_iter()
.map(|s| serde_json::to_string(s).expect("Failed to escape token string"))
.collect();
let out_dir = env::var("OUT_DIR").unwrap();
let dest_path = Path::new(&out_dir).join("generated.rs");
let mut f = BufWriter::new(File::create(&dest_path).unwrap());
writeln!(f, "pub static TOKENIZED_OUTPUT: &[&str] = &[").unwrap();
for token in &decoded_tokens {
// We use {:?} to handle escaping quotes/newlines in the strings
writeln!(f, " {:?},", token).unwrap();
}
writeln!(f, "];").unwrap();
writeln!(f, "pub static MAX_OUTPUT: &str = {:?};", raw_string).unwrap();
writeln!(
f,
"pub static MAX_TOKENS: usize = {};",
decoded_tokens.len()
)
.unwrap();
}The above was an over optimization for testing, but I noticed that tokenizers was only needed for creating the TOKENIZED_OUTPUT once, and it’s a const variable. So I was wondering if I could build a static array of &str directly.