Rust File I/O with Compression

Rust utility functions for transparent zstd compression/decompression when working with files

Wayne Lau

  ·  2 min read

While working on large datasets for data processing, I found these two functions very useful and have been using them more often. They handle file I/O with automatic zstd compression based on the file extension, so I don’t have to repeat the same boilerplate everywhere.

This was also where I started learning how to use the Box type.

Code #

use anyhow::{Error, bail};
use indicatif::ProgressBar;
use std::fs::File;
use std::io::{BufRead, BufReader, Read};
use std::io::{BufWriter, Write};
use std::path::Path;
use zstd::stream::read::Decoder;

/// Sets up a buffered reader for the given file path.
/// Automatically handles zstd decompression for .zst files.
/// Can add other file types if needed
pub fn setup_reader<P: AsRef<Path>>(
    input_path: P,
    pb: Option<ProgressBar>,
) -> Result<Box<dyn BufRead>, Error> {
    let path = input_path.as_ref();
    let thread_file = File::open(path)?;

    let progress_reader: Box<dyn Read> = if let Some(pb) = pb {
        Box::new(pb.wrap_read(thread_file))
    } else {
        Box::new(thread_file)
    };

    let reader: Box<dyn BufRead> = match path.extension().and_then(|s| s.to_str()) {
        Some("zst") => {
            let decoder = Decoder::new(progress_reader)?;
            Box::new(BufReader::new(decoder))
        }
        Some("jsonl") => Box::new(BufReader::new(progress_reader)),
        _ => bail!("Unsupported file type: expected .zst or .jsonl"),
    };

    Ok(reader)
}

/// Sets up a buffered writer for the given file path.
/// Automatically handles zstd compression for .zst files.
/// Can add other file types if needed
pub fn setup_writer<P: AsRef<Path>>(filename: P, level: i32) -> Result<Box<dyn Write>, Error> {
    let path = filename.as_ref();
    let outfile = File::create(path)?;

    let writer: Box<dyn Write> = match path.extension().and_then(|s| s.to_str()) {
        Some("zst") => {
            let encoder = zstd::stream::write::Encoder::new(outfile, level)?.auto_finish();
            Box::new(encoder)
        }
        Some("jsonl") => Box::new(BufWriter::new(outfile)),
        _ => bail!("Unsupported file type: expected .zst or .jsonl"),
    };

    Ok(writer)
}

Usage #

// Reading (supports .zst and .jsonl)
let reader = setup_reader("data.jsonl.zst", None)?;

// With progress bar
let pb = ProgressBar::new(file_size);
let reader = setup_reader("data.jsonl.zst", Some(pb))?;

// Writing (level 3 is default zstd compression)
let writer = setup_writer("output.jsonl.zst", 3)?;

// Reading lines
let reader = file::setup_reader(&args.input, None)?;

reader.lines()

// Writing
let mut writer = setup_writer(&args.output, 3)?;
writeln!(writer, "{}", "something")?;

Notes #

  • Returns Box<dyn BufRead> / Box<dyn Write> for format-agnostic handling
  • Automatically detects .zst extension and applies zstd compression/decompression
  • Progress tracking optional via indicatif::ProgressBar
  • Proper error propagation with anyhow::Result

Dependencies #

cargo add indicatif zstd anyhow