Senior MLOps Platform Architect

Remote, USA Full-time Posted 2025-11-24

This a Full Remote job, the offer is available from: EMEA Remote in EU| B2B Contract Role Overview We are hiring a senior MLOps who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you haven’t designed production-grade MLOps infrastructure, haven’t built CI/CD for ML, or haven’t deployed ML workloads on Kubernetes at scale, this role is not a fit. You will design, build, and own the AWS-based infrastructure, Kubernetes platform, CI/CD pipelines, and observability stack that supports our AI models (Agentic AI, NLU, ASR, Voice Biometrics, TTS). You will be the technical owner of MLOps infrastructure decisions, patterns, and standards. Key Responsibilities: MLOps Platform Architecture (from scratch) • Design and build AWS-based AI/ML infrastructure using Terraform (required). • Define standards for security, automation, cost efficiency, and governance. • Architect infrastructure for ML workloads, GPU/accelerators, scaling, and high availability. Kubernetes & Model Deployment • Architect, build, and operate production Kubernetes clusters. • Containerize and productize ML models (Docker, Helm). • Deploy latency-sensitive and high-throughput models (ASR/TTS/NLU/Agentic AI). • Ensure GPU and accelerator nodes are properly integrated and optimized. CI/CD for Machine Learning • Build automated training, validation, and deployment pipelines (GitLab/Jenkins). • Implement canary, blue-green, and automated rollback strategies. • Integrate MLOps lifecycle tools (MLflow, Kubeflow, SageMaker Model Registry, etc.). Observability & Reliability • Implement full observability (Prometheus + Grafana). • Own uptime, performance, and reliability for ML production services. • Establish monitoring for latency, drift, model health, and infrastructure health. Collaboration & Technical Leadership • Work closely with ML engineers, researchers, and data scientists. • Translate experimental models into production-ready deployments. • Define best practices for MLOps across the company. Requirements: We’re looking for a senior engineer with a strong DevOps/SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure, automation, and hands-on MLOps experience. • 5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments. • Strong experience designing, building, and maintaining Kubernetes clusters in production. • Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure. • Solid programming skills in Python or Go for building automation, tooling, and ML workflows. • Proven experience creating and maintaining CI/CD pipelines (GitLab or Jenkins). • Practical experience deploying and supporting ML models in production (e.g., ASR, TTS, NLU, LLM/Agentic AI). • Familiarity with ML workflow orchestration tools such as Kubeflow, Apache Airflow, or similar. • Experience with experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry). • Exposure to deploying models on GPU or specialized hardware (e.g., Inferentia, Trainium). • Solid understanding of cloud infrastructure on AWS, including networking, scaling, storage, and security best practices. • Experience with deployment tooling (Docker, Helm) and observability stacks (Prometheus, Grafana). Ways to Know You’ll Succeed • You enjoy building platforms from the ground up and owning technical decisions. • You’re comfortable collaborating with ML engineers, researchers, and software teams to turn research into stable production systems. • You like solving performance, automation, and reliability challenges in distributed systems. • You bring a structured, pragmatic, and scalable approach to infrastructure design. • Energetic and proactive individual, with a natural drive to take initiative and move things forward. • Enjoys working closely with people - researchers, ML engineers, cloud architects, product teams. • Comfortable sharing ideas openly, challenging assumptions, and contributing to technical discussions. • Collaborative mindset: you like to build together, not work in isolation. • Strong ownership mentality - you enjoy taking responsibility for systems end-to-end. • Curious, hands-on, and motivated by solving complex technical challenges. • Clear communicator who can translate technical work into practical recommendations. • Thrives in a fast-paced environment where you can experiment, improve, and shape how things are done. What's on Offer: • Competitive fixed compensation based on experience and expertise. • Work on cutting-edge AI systems used globall. • Dynamic, multi-disciplinary teams engaged in digital transformation. • Remote-first work model • Long-term B2B contract • 20+ days paid time off • Apple gear • Training & development budget Diversity and Inclusion Commitment We are dedicated to creating and sustaining an inclusive, respectful workplace for all -regardless of gender, ethnicity, or background. We actively encourage applicants from all identities and experience levels to apply and bring your authentic self to our fast-paced, supportive team. This offer from "Salve.Inno Consulting" has been enriched by Jobgether.com and got a 80% flex score. Apply tot his job Apply To this Job

Apply Now

Senior MLOps Platform Architect

Similar Jobs

Customer Service Representative

Remote Admin Support - Data Entry Role

Experienced Remote Amazon Warehouse Associate – E-commerce Fulfillment and Logistics Expert

Vendor Consultant, French, AVS-NOP

Team Manager – Amazon Store

Part-Time USPS Clerk - Flexible Hours

[Remote] Staff Platform Manager, Risk

Remote Admin Support - Data Entry Role

Input Data from Home - Flexible Hours

PA/NP - Virtual Urgent Care - Remote (Part time 0.5 FTE)

[Remote/WFM] Building Materials Sales Associate

Online Marketing Specialist - Flexible Schedule - Remote

Experienced Virtual Data Entry Specialist – Travel Industry Opportunities with Competitive Pay and Flexible Remote Work Schedule

[Remote/WFM] Care Navigator - Illinois License Required

Apply Now: Software Engineer Principal, Machine Learning &

[Remote/WFM] c Part-Time Customer Support Jobs @Remote

[Remote/WFM] Business Development - Agency and Creator Outreach

[Remote/WFM] Business Analyst Job at American Express in Lansing

Experienced Bilingual Customer Service Representative – Remote Work Opportunity at arenaflex

Home-Based Data Entry Jobs/ Work From Home-Jobs/ Online Data Entry Jobs/ Part-Time Jobs/ Typing Jobs

Senior MLOps Platform Architect

Similar Jobs

Customer Service Representative

Remote Admin Support - Data Entry Role

Experienced Remote Amazon Warehouse Associate – E-commerce Fulfillment and Logistics Expert

Vendor Consultant, French, AVS-NOP

Team Manager – Amazon Store

Part-Time USPS Clerk - Flexible Hours

[Remote] Staff Platform Manager, Risk

Remote Admin Support - Data Entry Role

Input Data from Home - Flexible Hours

PA/NP - Virtual Urgent Care - Remote (Part time 0.5 FTE)

[Remote/WFM] Building Materials Sales Associate

Online Marketing Specialist - Flexible Schedule - Remote

Experienced Virtual Data Entry Specialist – Travel Industry Opportunities with Competitive Pay and Flexible Remote Work Schedule

[Remote/WFM] Care Navigator - Illinois License Required

Apply Now: Software Engineer Principal, Machine Learning &

[Remote/WFM] c Part-Time Customer Support Jobs @Remote

[Remote/WFM] Business Development - Agency and Creator Outreach

[Remote/WFM] Business Analyst Job at American Express in Lansing

**Experienced Bilingual Customer Service Representative – Remote Work Opportunity at arenaflex**

Home-Based Data Entry Jobs/ Work From Home-Jobs/ Online Data Entry Jobs/ Part-Time Jobs/ Typing Jobs

Experienced Bilingual Customer Service Representative – Remote Work Opportunity at arenaflex