- Implement new data ingestion jobs utilizing Terraform or other infrastructure as code tools to automate workflows and improve overall data processing efficiency.
- Optimize existing data pipelines for performance, scalability, and reliability using best practices in distributed computing.
- Contribute to the continuous integration and delivery of high-quality software by collaborating with team members on Agile methodologies.
- Document and maintain technical documentation related to data engineering processes, tools, and infrastructure.
- Collaborate with DevOps engineers to ensure that CI/CD pipelines are functioning effectively and efficiently, and that deployment processes are well-defined and efficient.
- Provide support for critical production systems as needed.
Qualifications:
- We are looking for a candidate with a bachelor’s degree in a technology field and 4+ years of experience in a Data Engineer role.
- 4+ years of experience working with PySpark or other distributed computing frameworks (Python preferred).
- Proficiency in Python and Scala programming languages for data engineering tasks.