Resume
MLOps Engineer specializing in distributed LLM training and inference optimization. Currently leading GPU cluster infrastructure for SEA-LION at AI Singapore.
Skills #
Technical: Python, Rust, SQL | PyTorch, HuggingFace, DeepSpeed, Megatron-LM | Docker, Slurm, AWS
AI/ML: LLM Training & Fine-tuning, Multi-GPU/Multi-node Training, Computer Vision, NLP, MLOps
DevOps: Monitoring & Logging, IaC, Terraform, Ansible
Education #
JUN 2020 - MAY 2025 Singapore University of Social Sciences BSc (Hons) Mathematics with a Minor in Data Science
Experience #
DEC 2023 - PRESENT AI Singapore - AI Engineer, Infrastructure
- Configured distributed training environments for multi-node LLM training with Megatron-LM and llm-foundry
- Deployed multi-node Slurm cluster from bare metal DGX H100/H200 servers with Pyxis/Enroot support with Ansible
- Developed monitoring and alerting system using Mimir, Prometheus and Grafana for cluster occupancy and GPU health
- Moved from EC2 to ParallelCluster for cost savings and easier management of GPU instances
- Implemented lifecycle policies for S3 buckets to optimise storage costs by up to 50%
- Converted Python data processing to Rust, achieving 80% performance improvement
- Implemented training pipeline for new employees and did hiring interviews for the role
- Led knowledge sharing sessions on HPC and MLOps best practices for team leaders and members
- Presented infrastructure architecture decisions to cross-functional teams for alignment
- Created documentation and tutorials for cluster usage and MLOps workflows
- Experimentations: Slinky cluster, KAI-schedulers, NVIDIA MIG
Key skills: Slurm, Ansible, Pytorch, Infrastructure, MLOps, AWS, FinOps, Rust
FEB 2023 - NOV 2023 AI Singapore - AI Apprentice (CAIE Associate AI Engineer)
- Designed document understanding model with synthetic data generation, eliminating traditional box labeling
- Built LangChain-integrated document retrieval chatbot with OpenAI function calling
- Established CI/CD pipeline with GitHub Actions for automated testing and linting
- Coached teams on Deep Learning, Computer Vision, and NLP
- Recipient of “Outstanding Apprenticeship Award”
Key skills: Python, PyTorch, NLP, Computer Vision, MLOps, RAG, LangChain
DEC 2017 - FEB 2023 Republic of Singapore Air Force - Air Force Engineer
- Specialist in vibration analysis and balancing for helicopter systems
- Performed root cause analysis and troubleshooting for aircraft system rectifications
- Managed multi-team task scheduling and coordination for bilateral mission preparations
Certifications #
- CAIE Associate AI Engineer (May 2023)


