Return to home page
Data Scientist experience in machine learning
Description
We are looking for a highly qualified Data Engineer to join our innovative team. The ideal candidate will be responsible for developing Python-based solutions to deploy and manage cloud data services (Azure Databricks and Azure Data Factory).
Key Responsibilities:
- Code Development and Adaptation
- Refactor and adapt existing Python code to handle new data schemas and transformations.
- Extend current code by incorporating new data fields and new data sources.
- Develop new modules to process, transform, and load data using PySpark or Python-based frameworks.
- Implement data validation and error-handling mechanisms.
- Databricks Platform Support (Nice to Have)
- Support the initial configuration of Databricks, including library installation and resource optimization.
- Integrate Databricks with cloud storage and data sources (Azure and/or Databricks).
- Automate jobs in Databricks or Azure Data Factory.
- CI/CD, Testing, and Quality Assurance
- Set up CI/CD pipelines for code and configuration deployment.
- Write and execute unit, integration, and performance tests for data pipelines.
- Debug and troubleshoot issues in distributed environments.
- Collaboration and Documentation
- Work closely with data engineers, analysts, and cloud architects to ensure seamless integration and operation.
- Document code, workflows, and platform configurations for transparency and future reference.
Competencies
- Azure
- CI/CD
- Python
- Databricks
- PySpark
- Azure Data Factory