Skills
Hey, Pavan here. I'm a Machine Learning Engineer and applied researcher based in Los Angeles — building production systems across the ML stack, from multimodal Transformers and CUDA kernels to deployed FastAPI services, and enjoying life along the way.
From multimodal Transformers surpassing AI-assisted clinician accuracy on a Harvard Medical School EEG benchmark, to CUDA int4 dequantization kernels for LLM inference — I build models that run fast enough to matter.
End-to-End Model Development
Designing and training Transformer, CNN, and hybrid CNN-RNN architectures for computer vision and sequence tasks — including landmark-aware Transformers for sign language recognition and CNN-spectrogram models for audio classification.
Edge AI & Performance Optimization
Specializing in model quantization and deployment with TensorFlow Lite and ONNX, achieving real-time inference (up to 112 FPS) on edge devices including NVIDIA Jetson Orin Nano and Apple M1.
MLOps & Scalable Deployment
Building robust, production-grade ML systems using Docker, FastAPI, and GitHub Actions for CI/CD. Experience in load-testing services to handle 1,000+ RPS and ensuring code quality with high test coverage (85% with PyTest).
Data Engineering & ETL Pipelines
Building and optimizing large-scale data pipelines (3M+ rows) with Pandas and NumPy. Reduced ETL runtimes by 24% through vectorized operations, and built domain-wide web crawlers processing 72k+ pages with structured per-site reporting.
Applied AI Research & Innovation
Authored two peer-reviewed papers (Springer ADCIS 2024, IEEE ICDSNS 2024). Current research spans tri-modal Transformer design with Dirichlet-Multinomial label distribution learning for clinical EEG, and W4 LLM quantization with custom CUDA kernels — from theory to reproducible experiment.
LLM Quantization & Systems
Researching mixed-precision quantization (W4 int4 + E5Mx mini-float per-group scales) for LLM inference, writing custom CUDA dequantization kernels (sm_80), and evaluating quality with lm-eval across ARC, HellaSwag, and LAMBADA benchmarks.
Work Experience
Graduate Research Assistant
Built an agentic ML workflow generating context-aware alt text from raw images (figures, charts, photos) using OCR/ASR + VLM prompts with automated WCAG 2.1/2.2 checks and human-in-the-loop QA. Processed 1.9k documents at 2.4× throughput, achieving 94% alt-text acceptance, 18% fewer edits, caption WER of 7.4%, and 47% faster review turnaround (9.1h → 4.8h).
Developed a domain-wide auditor for .calstatela.edu (crawled 128 subdomains, ~72k pages, ~9.6k docs) that produces per-site reports listing affected sites and the specific documents and remediations needed; integrated with ITS workflows to drive +11 percentage point accessibility scores across 8 units, fix 52% broken links, and add +33 percentage points of alt-text coverage.
Research Intern
Built SignEase, a landmark-aware Transformer achieving 98.3% accuracy across 250 ASL classes, published as first author at Springer ADCIS 2024 (500+ attendees). Reached 81% top-1 (+10 percentage points) on the Kaggle Isolated Sign-Language Recognition benchmark (100k videos). Deployed via INT8 quantization + TFLite: 78 FPS on Apple M1 and 112 FPS on NVIDIA Jetson Orin Nano.
Containerized a FastAPI inference service and load-tested it to 1,000 req/s at 110 ms P95 on AWS EC2. Built 40+ PyTest cases (85% coverage), introduced GitHub Actions CI, and optimized a 3M-row ETL pipeline from 25 min to 19 min with vectorized Pandas.
Publications
Spatio-temporal Representation Learning for Isolated Sign Language Recognition Using Transformer
Vol. 1333 · doi:10.1007/978-981-96-4536-7_24 · 500+ conference attendees
Read paperBioacoustic Bird Monitoring: A Deep Learning Solution for Effective Biodiversity Conservation
doi:10.1109/ICDSNS62112.2024.10691115
Read paperFeatured projects
A little about me.

I am a Machine Learning Engineer and applied researcher based in Los Angeles, CA. I specialize in multimodal deep learning, clinical AI, and production model deployment — from designing Transformers that surpass clinician-level accuracy on Harvard Medical School EEG benchmarks to shipping FastAPI services sustaining 1,000 req/s. Currently pursuing my M.S. in Computer Science at Cal State LA (GPA 4.0), with 2 peer-reviewed publications in Springer ADCIS 2024 and IEEE ICDSNS 2024.
When I’m not training models, I enjoy listening to music, swimming, and playing soccer. I’m drawn to problems where a well-designed model makes a measurable real-world difference — and that curiosity is what keeps me going.





