Part-time Data Engineer (Databricks and Azure)
Our client is unable to hire H1-B candidates at this time.
Client is a small, growing consulting company, with a focus on AI and Data Solutions. They are seeking a part-time Data Engineer (20 hours/week) with extensive experience on Databricks and Azure.
As a Data Engineer:
• You will be a critical player in building the foundational data infrastructure for a leading firm's data and AI strategy.
• Working primarily with Databricks on the Azure platform, you will design, develop, and maintain robust data pipelines, ingesting diverse data sources and transforming them into actionable insights.
• You will collaborate closely with the product team and other stakeholders to construct a data lakehouse that will power integrations, advanced analytics and future AI-driven workflows, all while handling sensitive client data with the utmost care and responsibility.
Responsibilities:
• Partner closely with the Product Manager, Product Designer, and client stakeholders to understand data requirements and translate them into effective data solutions within Azure and Databricks.
• Design, build, and maintain scalable and reliable data pipelines in Databricks to ingest data from a variety of source types (e.g., business workflow systems, accounting systems, CRM, databases, APIs, flat files).
• Implement and manage a medallion architecture (Bronze, Silver, Gold layers) within Databricks, transforming raw data into curated, business-ready datasets tailored for specific use cases defined by the product team.
• Develop gold layer tables and views optimized for analytics, ensuring they meet the requirements for dashboards and reports, particularly for consumption via Power BI.
• Configure and optimize Databricks to connect seamlessly with BI tools like Tableau and Power BI, enabling self-service analytics for the customer.
• Work with potentially sensitive client data, implementing and adhering to strict data security, privacy, and governance protocols.
• Leverage your skills in Databricks, including familiarity with or a strong willingness to quickly learn features like MLflow, Delta Lake, and Unity Catalog.
• Apply DevOps best practices to data pipeline development, including automation, monitoring, and CI/CD where applicable.
• Collaborate on the design and optimization of data models, ensuring they align with business needs, performance requirements, and future scalability.
• Implement robust automated testing procedures to validate data pipelines, ensure data quality, and maintain the accuracy of transformed data.
• Create and maintain comprehensive documentation for data pipelines, data models, architectural decisions, and operational procedures.
• Establish monitoring and alerting solutions to proactively identify and resolve issues in data pipelines, ensuring data availability and reliability.
• Communicate effectively with both technical and non-technical stakeholders, clearly explaining data engineering concepts, design choices, and progress.
• Contribute to a collaborative environment within a large, cross-functional consulting team.
Requirements:
• Proven experience as a Data Engineer, with a strong focus on designing and implementing solutions on the Databricks platform.
• Hands-on expertise in building and maintaining scalable Python data pipelines within Azure and Databricks.
• Demonstrable experience in implementing medallion data architecture (Bronze, Silver, Gold layers) in to support analytics and AI use cases.
• Proficiency in ingesting data from diverse source types (e.g., APIs, relational databases, NoSQL databases, flat files, streaming sources).
• Experience with BI tools and optimizing for maintainability and performance.
• Strong SQL skills and proficiency in data modeling techniques.
• Experience with Azure cloud services, particularly Azure Data Lake Storage (ADLS Gen2), Azure Key Vault, Azure Data Factory or other Azure data services.
• Familiarity with MLflow for managing the machine learning lifecycle is a strong plus; curiosity and ability to quickly learn new Databricks features is essential.
• Understanding of DevOps principles and experience with tools for CI/CD, version control (e.g., Git), and infrastructure automation is advantageous.
• Experience working with sensitive data and a strong understanding of data security, data governance, and privacy-preserving techniques.
• Excellent problem-solving skills and the ability to troubleshoot complex data issues.
• Strong communication skills, with the ability to articulate technical details and decisions to product managers, client stakeholders, and other engineers.
• Ability to work effectively within a large, cross-functional consulting team in a dynamic, client-facing environment.
• A mindset that leans into data science concepts (understanding data needs for ML) or advanced DevOps practices.
Apply tot his job
Apply To this Job