Sumir
SethIdentifying and solving problems
Work
Previously, I worked with data at Pixa AI. I spent most of my time building pipelines to handle and preprocess datasets for machine learning. It involved a lot of scraping, ingestion, and optimizing encoding for multi-GPU environments. I also dove into Retrieval-based Voice Conversion (RVC), training custom models and writing scalable pipelines to convert voice data between different speakers. To keep track of it all, I built a monitoring dashboard. It was hands-on work that taught me a lot about the infrastructure behind AI training.
Me
I've been tinkering with computers and electronics since childhood. Growing up, I built everything from Discord bots and websites to npm packages and AI application layers. More recently, I've focused on engineering high-throughput scraping and processing pipelines. Beyond engineering, I have a strong grasp of model architectures and training dynamics, with practical experience fine-tuning LLMs and orchestrating them for diverse applications. Now, I'm on the path to building something from scratch. I learn fast, adapt quickly, and pride myself on strong communication and emotional intelligence.
Further
Exposure to large-scale models shifted my focus toward efficiency. I'm now exploring Small Language Models (SLMs) and model merging/breeding techniques to solve niche problems with limited compute. This research direction is what led me to start Marin.