I'm an AI Engineer with 6+ years of experience across the full ML stack — from NLP pipelines and distributed inference systems to multi-agent LLM architectures and Vision-Language-Action models for robot learning.
My work sits at the intersection of research and deployment: I've built cooperative agent systems that handle live healthcare operations at 92% routing accuracy and <2s latency, designed continuous evaluation frameworks with human feedback on Vertex AI, and trained a 77M parameter VLA model for multi-task robot manipulation using behavior cloning and cross-attention fusion. Earlier in my career, I built production NLP pipelines (OCR + CNN) that cut document processing time by 82% and designed asynchronous distributed inference systems with RabbitMQ and Celery to manage model concurrency at scale.
Across these roles, the through-line has been the same: take a well-defined problem, pick the right architecture, instrument it properly, and iterate until the system is reliable enough to depend on.
I recently completed a 77M parameter VLA model combining EfficientNet-B0 (vision) + DistilBERT (language) + cross-attention fusion to predict robot manipulation actions from camera images and natural language instructions. Using behavior cloning on scripted expert policies for pick/push/place tasks, I achieved 22% loss reduction through multi-task learning on Apple Silicon GPU with MPS acceleration. The full implementation is on GitHub.
I'm actively looking for teams working on real-world policy learning, sim-to-real transfer, or deploying RL agents in production environments — particularly in robotics, autonomous systems, or LLM-driven decision-making.
I'm most excited by roles where the research-to-deployment gap is the core problem: building reliable systems around learned behavior, designing evaluation frameworks for policies, and making models that generalize beyond their training distribution.
If your team is working on embodied AI, multi-agent coordination, or the infrastructure side of RL at scale — I'd love to connect.
Built a 77M parameter Vision-Language-Action model combining EfficientNet-B0 vision encoder, DistilBERT language encoder, and cross-attention fusion to predict robot manipulation actions from camera images and natural language instructions. Achieved 22% loss reduction via multi-task behavior cloning across pick/push/place tasks.
Read More
Implemented a sequential Deep-Q Learning agent on OpenAI's continuous mountain car environment, converging in 1643 episodes with 12 hidden dimensions.
Read More
GAN network for image translation from line art to colored butterfly images using a Pix2Pix architecture (U-Net Generator + PatchGAN Discriminator). Achieved discriminator and generator loss of 0.91 and 1.23 respectively.
Read More
Image segmentation model using Mask-RCNN for object detection and segmentation masks on nuclei microscopy images. Achieved a mean average precision of 0.73.
Read More
Implemented a modified SAM-SLR framework for real-time continuous sign language recognition, integrating multi-cue visual streams for improved temporal modeling. Master's thesis project at SJSU.
Scraped IMDB movie plot data, applied preprocessing (stop words, PorterStemmer), and fine-tuned a Pegasus model using a Self-Supervised Objective for abstractive summarization.
Read More
Comparative analysis of sentiment classification on Amazon reviews, implementing and benchmarking FNN, CNN, LSTM, and DistilBERT models with word embeddings and data preprocessing.
Read More
Trained an LSTM model on Lo-Fi style music data to generate stylistically similar music sequences from learned temporal patterns.
Read More