Implement new data ingestion jobs utilizing Terraform or other infrastructure as code tools to automate workflows and improve overall data processing efficiency.
Optimize existing data pipelines for performance, scalability, and reliability using best practices in distributed computing.
Contribute to the continuous integration and delivery of high-quality software by collaborating with team members on Agile methodologies.
Document and maintain technical documentation related to data engineering processes, tools, and infrastructure.
Collaborate with DevOps engineers to ensure that CI/CD pipelines are functioning effectively and efficiently, and that deployment processes are well-defined and efficient.
Provide support for critical production systems as needed.
Qualifications:
We are looking for a candidate with a bachelor’s degree in a technology field and 4+ years of experience in a Data Engineer role.
4+ years of experience working with PySpark or other distributed computing frameworks (Python preferred).
Proficiency in Python and Scala programming languages for data engineering tasks.