[Remote] Research Engineer (Machine Learning)
Note: The job is a remote job and is open to candidates in USA. Aldea is a multi-modal foundational AI company focused on advancing the scaling laws of intelligence. The Research Engineer (Machine Learning) will build and optimize the infrastructure for multi-modal AI research, enabling the team to experiment with next-generation architectures in language and speech domains.
Responsibilities
• Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale.
• Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements.
• Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration.
• Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications.
• Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems.
• Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap.
Skills
• Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
• 3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar).
• Experience training large-scale deep learning models at 1B+ parameters.
• Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management.
• Proven ability to build production-grade ML infrastructure with high reliability.
• Track record of delivering significant performance optimizations in ML training or inference systems.
• Experience with custom kernel development (CUDA, Triton) or GPU optimization.
• Hands-on experience with large-scale pretraining (100B+ tokens, ideally trillion+ scale).
• Experience optimizing inference for production: quantization, vLLM, TensorRT, or custom serving engines.
• Familiarity with speech/audio ML systems and real-time inference constraints.
• Experience building automated evaluation frameworks and experiment tracking systems.
• Knowledge of profiling tools and multi-node training across 8-32+ GPUs.
• Exposure to job orchestration systems (SLURM, Kubernetes, Ray).
• Master's or PhD in Computer Science, Machine Learning, or related field.
Benefits
• Competitive base salary
• Performance-based bonus aligned with research and model milestones
• Equity participation
• Comprehensive health, dental, and vision coverage
• Flexible paid time off
Company Overview
• Aldea is a foundational AI company building next-generation voice and language models that power how people and software communicate. It was founded in undefined, and is headquartered in Miami, FL, US, with a workforce of 11-50 employees. Its website is https://aldea.ai.
Apply tot his job
Apply To this Job