Junior Data Scientist
CA
Permanent

Data Engineer
Role Overview
We are looking for an experienced Data Engineer to join the Data & AI team delivering core data platform capabilities. This is a hands-on engineering role focused on designing and building robust Python-based data pipelines, ETL processes, and data models across a modern data platform. The ideal candidate is a strong Python developer with solid SQL skills and a good understanding of data lake architecture. Exposure to data quality tooling / techniques, whether Informatica, Microsoft Purview, or custom-built frameworks, is a valuable plus.
Key Responsibilities
1. Design, build, and maintain end-to-end data pipelines using Python, ensuring reliable data ingestion, transformation, and delivery across the platform.
2. Develop and optimize ETL/ELT processes to move and transform data across bronze, silver, and gold layers following the Medallion architecture pattern.
3. Write clean, modular, production-grade Python code for data processing, orchestration, and automation tasks.
4. Supports in design and implementation of data models, schemas, and storage strategies to support downstream analytics and reporting requirements.
5. Build and maintain SQL-based transformations, stored procedures, and views for data validation and reconciliation.
6. Develop and manage data ingestion frameworks, handling a variety of source formats (flat files, APIs, databases, streaming).
7. Implement data quality checks and validations within pipelines, applying rules across completeness, validity, consistency, uniqueness, accuracy, and timeliness dimensions.
8. Monitor pipeline health, build alerting mechanisms, and troubleshoot data issues in production environments.
9. Contribute to CI/CD pipelines for data workloads, including automated testing, deployment, and version control practices.
10. Produce clear technical documentation for pipelines, data models, and operational runbooks.
Required Skills & Experience
• Strong Python development skills with hands-on experience building production data pipelines (pandas, PySpark, or equivalent).
• Solid SQL skills for complex queries, data transformations, and performance tuning.
• Experience designing and implementing ETL/ELT processes at scale.
• Good understanding of the Medallion architecture (Bronze / Silver / Gold) and modern data lake/Lakehouse design patterns.
• Experience with data orchestration tools.
• Working knowledge of cloud data platforms (Azure, AWS, or GCP).
• Familiarity with relational and non-relational databases.
• Strong problem-solving skills with the ability to debug complex data pipeline issues.
Desirable Skills
• Exposure to data quality tooling such as Informatica IDQ, Microsoft Purview, Great Expectations, or custom-built DQ frameworks.
• Experience with Informatica EDC or similar tools for data cataloguing and lineage.
• Familiarity with data governance platforms such as Informatica Axon or Microsoft Purview for policy management and stewardship workflows.
• Knowledge of Power BI or other BI tools for data visualization and reporting.