We are a small, fully remote team of musicians, engineers, marketers, and creators who care deeply about building an outstanding product. We value ambition, ownership, and the ability to move fast while maintaining high quality.
Our mission is to make high-quality, unique vocals accessible to musicians everywhere. We work at the forefront of audio technology, exploring new trends in audio generation, vocal synthesis, and state-of-the-art machine learning solutions to empower musicians to create their best work.
As an Audio Machine Learning Engineer, you will focus on solving challenging problems related to AI-based vocals and audio engineering. You will collaborate closely with the AI team to design, develop, and improve large-scale machine learning models for audio applications.
You will play a key role in researching new approaches, building production-ready models, and continuously improving existing systems using modern machine learning techniques.
Researching, developing, and improving machine learning models for singing voice synthesis (SVS) and voice conversion
Experimenting with diffusion-based generative models for vocals and audio
Working with neural vocoders (e.g., HiFi-GAN–style architectures, large-scale GAN-based or diffusion-based vocoders)
Designing and improving audio feature extraction pipelines for vocal modeling
Working with large, high-quality vocal and music datasets
Improving model quality, robustness, and inference performance
Integrating new models and improvements into production systems
Writing clean, efficient, and scalable Python code using PyTorch
Master’s degree (or higher) in Machine Learning, AI, Computer Science, or a related field
Strong experience as a Machine Learning Developer, with Python as your primary language
Hands-on experience with PyTorch for training and deploying deep learning models
Solid understanding of singing voice synthesis, voice conversion, and modern audio modeling techniques
Familiarity with diffusion-based generative models and neural vocoders
Excellent English communication skills, both written and spoken
Comfortable taking ownership, collaborating with a remote team, and working effectively in a fast-paced startup environment
Bonus: Background or strong interest in music production, vocals, or audio engineering
We hire within GMT 3 to GMT +4 time zones. Outside of this range, collaboration becomes challenging.