Data Scientist experience in machine learning

We are looking for a highly qualified Data Engineer to join our innovative team. The ideal candidate will be responsible for developing Python-based solutions to deploy and manage cloud data services (Azure Databricks and Azure Data Factory).

Key Responsibilities:

Code Development and Adaptation
Refactor and adapt existing Python code to handle new data schemas and transformations.
Extend current code by incorporating new data fields and new data sources.
Develop new modules to process, transform, and load data using PySpark or Python-based frameworks.
Implement data validation and error-handling mechanisms.
Databricks Platform Support (Nice to Have)
Support the initial configuration of Databricks, including library installation and resource optimization.
Integrate Databricks with cloud storage and data sources (Azure and/or Databricks).
Automate jobs in Databricks or Azure Data Factory.
CI/CD, Testing, and Quality Assurance
Set up CI/CD pipelines for code and configuration deployment.
Write and execute unit, integration, and performance tests for data pipelines.
Debug and troubleshoot issues in distributed environments.
Collaboration and Documentation
Work closely with data engineers, analysts, and cloud architects to ensure seamless integration and operation.
Document code, workflows, and platform configurations for transparency and future reference.

Competencies

Azure CI/CD Python Databricks PySpark Azure Data Factory