Back to Jobs

[Remote] Site Reliability Engineer II- Data Platforms (Remote)

Remote, USA Full-time Posted 2025-11-24
Note: The job is a remote job and is open to candidates in USA. UNFI is a company focused on ensuring the stability and performance of their data platforms. The Data Platform Reliability Engineer is responsible for monitoring, troubleshooting, and automating workflows for various data services, while collaborating with internal teams and external partners. Responsibilities • Monitor health and performance of Databricks clusters, jobs, and workflows • Maintain observability dashboards, alerts, and logs for AWS services and ingestion pipelines • Respond to incidents, perform root cause analysis, and implement corrective actions • Monitor and optimize platform costs across cloud and data services • Implement cost-control measures and provide regular reporting • Implement and maintain cost controls: cluster policies, auto-termination, right-sizing, job scheduling, storage lifecycle policies • Monitors spend and utilization for Databricks, AWS, ingestion, and BI services • Promote performance best practices • Build and maintain dashboards, alerts, and logs for Databricks, AWS services, ingestion pipelines, and BI refreshes • Continuously tune alert thresholds to reduce noise and improve signal-to-action ratio • Ensure end-to-end lineage/traceability for faster fault isolation across stages • Coordinate with external support teams for day-to-day operations and issue resolution • Coordinate with vendors for troubleshooting, service improvements, and escalations • Track and report on SLA adherence and vendor performance • Maintain operational runbooks, knowledge base, and handoff procedures between internal teams and external partners • Drive automation and efficiency in operational workflows • Optimize resource utilization and reduce manual intervention • Support Power BI, Tableau, and Alteryx operations (gateway health, dataset refresh schedules, workspace/app permissions, data-source connectivity) • Monitor and improve dataset refresh reliability, query performance, and user access hygiene • Performs other duties as assigned Skills • Bachelor's degree in computer science, data analytics, systems analysis, or a related field • 3+ years in data platform operations or reliability engineering • Hands-on experience with Databricks and AWS services in production environments • Demonstrated success in maintaining high-impact data platforms, with a strong track record of managing complex environments • Familiarity with ingestion tools (Fivetran, AWS DMS, DataStage, Informatica) and BI platforms (Power BI, Tableau, Alteryx) • Experience with SAP, master data management, and cross-functional processes across supply chain, finance, and operations • Strong troubleshooting and incident management skills • Knowledge of governance, security, and RBAC principles • Ability to work independently and collaborate with external partners • Familiarity with Agile practices and DevOps principles • Good judgment is required for this position as there may be times when direct supervision may not be immediately available Benefits • Paid Time Off • Sick Time • Paid holidays and parental leave • 401K Program (or retirement savings plan if in Canada) • Medical, dental, vision, life, and accidental death/dismemberment insurance • Short-term and long-term disability insurance program • Flexible Spending Account and/or Health Savings Account (U.S. only) Company Overview • UNFI is North America’s Premier Food Wholesaler. It was founded in 1978, and is headquartered in Providence, Rhode Island, USA, with a workforce of 10001+ employees. Its website is http://unfi.com. Company H1B Sponsorship • UNFI has a track record of offering H1B sponsorships, with 6 in 2025, 2 in 2024, 4 in 2023, 4 in 2022. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job Apply To this Job

Similar Jobs