Return to home page

Data Scientist experience in machine learning

Description

We are looking for a highly qualified Data Engineer to join our innovative team. The ideal candidate will be responsible for developing Python-based solutions to deploy and manage cloud data services (Azure Databricks and Azure Data Factory).

Key Responsibilities:

  • Code Development and Adaptation
  • Refactor and adapt existing Python code to handle new data schemas and transformations.
  • Extend current code by incorporating new data fields and new data sources.
  • Develop new modules to process, transform, and load data using PySpark or Python-based frameworks.
  • Implement data validation and error-handling mechanisms.
  • Databricks Platform Support (Nice to Have)
  • Support the initial configuration of Databricks, including library installation and resource optimization.
  • Integrate Databricks with cloud storage and data sources (Azure and/or Databricks).
  • Automate jobs in Databricks or Azure Data Factory.
  • CI/CD, Testing, and Quality Assurance
  • Set up CI/CD pipelines for code and configuration deployment.
  • Write and execute unit, integration, and performance tests for data pipelines.
  • Debug and troubleshoot issues in distributed environments.
  • Collaboration and Documentation
  • Work closely with data engineers, analysts, and cloud architects to ensure seamless integration and operation.
  • Document code, workflows, and platform configurations for transparency and future reference.

Competencies

  • Azure
  • CI/CD
  • Python
  • Databricks
  • PySpark
  • Azure Data Factory