[Remote] Research Scientist Intern (AI Infrastructure)- 2026 Start (PhD)
Note: The job is a remote job and is open to candidates in USA. ByteDance is a leading technology company focused on inspiring creativity and enriching life through innovative products. They are seeking a Research Scientist Intern to work on AI infrastructure, collaborating with researchers and engineers to enhance AI-driven products and services.
Responsibilities
- Lead end-to-end design of scalable, reliable AI infrastructure (AI accelerators, compute clusters, storage, networking) for training and serving large ML workloads
- Define and implement service-oriented, containerized architectures (Kubernetes, VM frameworks, unikernels) optimized for ML performance and security
- Profile and optimize every layer of the ML stack—ML Compiler, GPU/TPU scheduling, NCCL/RDMA networking, data preprocessing, and training/inference frameworks
- Develop low-overhead telemetry and benchmarking frameworks to identify and eliminate bottlenecks in distributed training and serving
- Build and operate large-scale deployment and orchestration systems that auto-scale across multiple data centers (on-premises and cloud)
- Champion fault-tolerance, high availability, and cost-efficiency through smart resource management and workload placement
- Architect and implement robust ETL and data ingestion pipelines (Spark/Beam/Dask/Flume) tailored for petabyte-scale ML datasets
- Integrate experiment management and workflow orchestration tools (Airflow, Kubeflow, Metaflow) to streamline research-to-production
- Partner with ML researchers to translate prototype requirements into production-grade systems
- Mentor and coach engineers on best practices in performance tuning, systems design, and reliability engineering
Skills
- Graduation date in 2026 year with a PhD in Computer Science, Engineering, or a related technical field
- Understanding of infrastructure or systems engineering focused roles, with ML/AI infrastructure
- Strong programming skills in Python, C++, Go, or Rust for systems development and automation
- Excellent communicator able to bridge research and production teams
- Strong problem-solving aptitude and a drive to push the state of the art in ML infrastructure
Benefits
- Interns have day one access to health insurance
- Life insurance
- Wellbeing benefits and more.
- Interns also receive 10 paid holidays per year
- Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year).
- Interns who are not working 100% remote may also be eligible for housing allowance.
Company Overview
Company H1B Sponsorship
Apply To This Job