Sumir

Seth

Identifying and solving problems

Work

Previously, I worked with data at Pixa AI. I spent most of my time building pipelines to handle and preprocess datasets for machine learning. It involved a lot of scraping, ingestion, and optimizing encoding for multi-GPU environments. I also dove into Retrieval-based Voice Conversion (RVC), training custom models and writing scalable pipelines to convert voice data between different speakers. To keep track of it all, I built a monitoring dashboard. It was hands-on work that taught me a lot about the infrastructure behind AI training.

Me

I've been tinkering with computers and electronics since childhood. Growing up, I built everything from Discord bots and websites to npm packages and AI application layers. More recently, I've focused on engineering high-throughput scraping and processing pipelines. Beyond engineering, I have a strong grasp of model architectures and training dynamics, with practical experience fine-tuning LLMs and orchestrating them for diverse applications. Now, I'm on the path to building something from scratch. I learn fast, adapt quickly, and pride myself on strong communication and emotional intelligence.

Further

Exposure to large-scale models shifted my focus toward efficiency. I'm now exploring Small Language Models (SLMs) and model merging/breeding techniques to solve niche problems with limited compute. This research direction is what led me to start Marin.

Projects

Marin Labs

Open collective dedicated to accessible AI research.

View Project →