Back to Jobs

[Remote] AI Infrastructure Site Reliability Engineer (remote USA)

Remote, USA Full-time Posted 2025-11-24
Note: The job is a remote job and is open to candidates in USA. Cisco is at the forefront of integrating artificial intelligence into its platforms, transforming collaboration and security. As an AI Infrastructure Site Reliability Engineer, you will leverage SRE practices to maintain service level objectives for AI platforms, automate operational capabilities, and ensure the efficiency of high-performance compute infrastructure. Responsibilities • Leverage SRE practices to reduce toil and maintain Service Level Objectives (SLOs) for internal AI platforms • Lead, build, and run fully automated pipelines through CI/CD systems for operational excellence and continuous improvements • Ensure the availability, scalability, latency, and efficiency of NVIDIA DGX and Cisco-UCS infrastructure using fault-tolerant engineering approaches • Drive capacity planning, performance analysis, instrumentation, and other non-functional requirements • Automate operational capabilities using Python, Ansible, Terraform, Go, and related technologies • Deliver automation through CI/CD pipelines and chatbot integrations • Implement metrics-driven processes to maintain high service quality Skills • Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent years of IT experience • 5+ years Experience deploying and administering NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g., Cray, HPE, IBM) • 5+ years coordinating and supporting Linux-based operating systems • 5+ years Proficiency in programming languages such as Python, Go, C/C++; experience with Git and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins) • 5+ years experience deploying enterprise-grade Kubernetes clusters (RedHat OpenShift preferred) and/or Google Anthos • Advanced knowledge of Kubernetes, Docker, Terraform, Ansible, Jenkins, GitOps, Git, and Linux • 5+ years Experience with the software development lifecycle: design, development, testing, packaging, and deployment (preferably using Python or Go) • Master's degree or equivalent experience in a relevant field • Certifications in Linux, networking, cloud, or related technologies • Previous experience as a compute or site/systems reliability engineer • Experience with hybrid cloud, virtualization, and container technologies • Familiarity with Agile and DevOps operating models, including project tracking tools (e.g., Jira, Rally) • Excellent collaboration, leadership, and communication skills Benefits • Medical, dental and vision insurance • 401(k) plan with a Cisco matching contribution • Paid parental leave • Short and long-term disability coverage • Basic life insurance • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees • 1 paid day off for employee’s birthday • Paid year-end holiday shutdown • 4 paid days off for personal wellness determined by Cisco • 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees • Flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) • 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next • Additional paid time away may be requested to deal with critical or emergency issues for family members • Optional 10 paid days per full calendar year to volunteer • Annual bonuses subject to Cisco’s policies • Performance-based Incentive Pay On Top Of Their Base Salary Company Overview • Cisco develops, manufactures, and sells networking hardware, telecommunications equipment, and other technology services and products. It is a sub-organization of Cisco Press. It was founded in 1984, and is headquartered in San Jose, California, USA, with a workforce of 10001+ employees. Its website is http://www.cisco.com. Company H1B Sponsorship • Cisco has a track record of offering H1B sponsorships, with 1045 in 2025, 1231 in 2024, 1273 in 2023, 2127 in 2022, 1991 in 2021, 1173 in 2020. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job Apply To this Job

Similar Jobs

Mid-Level AI/ML/NLP Engineer

Remote, USA Full-time

[Work From Home] Amazon Warehouse Associate

Remote, USA Full-time

Remote Junior Medical Writer - Biostatistics & Clinical Evidence (German Native Speaker / Spain Remote)

Remote, USA Full-time

Compliance Analyst (contract)

Remote, USA Full-time

Fund Accountant (VP/Director) - Alternate Asset Management OR Investment Management

Remote, USA Full-time

Director of Engineering

Remote, USA Full-time

Logistics Analyst – Inventory and Property Management

Remote, USA Full-time

Intern I - Payer Analytics & Operations Contracting Analyst

Remote, USA Full-time

IT Tester (Real Time Performance Tester)

Remote, USA Full-time

Entry level / Virtual Personal Assistant (Remote) – Meterevaa (job id : 1674713468)

Remote, USA Full-time

**Experienced Entry-Level Data Entry Specialist – Entertainment Industry Data Management and Content Cataloging**

Remote, USA Full-time

**Experienced Full Stack Data Center Operations Engineer – Web & Cloud Application Development for Google's Data Center Infrastructure**

Remote, USA Full-time

**Experienced Full Stack Customer Support Engineer – Cloud Computing Solutions and Technical Account Management**

Remote, USA Full-time

**Experienced Full Stack Data Entry Specialist – Remote Opportunity for Entry-Level Professionals**

Remote, USA Full-time

Vehicle Operator, Robotaxi – Amazon Store

Remote, USA Full-time

**Experienced Customer Service Representative – Delivering Exceptional Experiences for Amazon Customers**

Remote, USA Full-time

**Experienced Data Entry Clerk with Flexible Hours – Remote Work Opportunity for a Detail-Oriented Typist**

Remote, USA Full-time

**Experienced Data Entry Specialist for AI Development – Remote Work Opportunity with Competitive Hourly Rate**

Remote, USA Full-time

Experienced Remote Data Entry Specialist – Join CVS Health's Virtual Team and Contribute to Improving Healthcare Outcomes

Remote, USA Full-time

**Experienced Remote Data Entry Specialist – Global Logistics and Transportation Industry**

Remote, USA Full-time