What you’ll do
- Implement, manage, and monitor databases in our Apache Iceberg-powered data lake environment to ensure high levels of data availability and performance.
- Work with data engineering teams to design and implement scalable database schemas and optimize data storage and retrieval processes.
- Perform regular database maintenance tasks such as backups, indexing, and performance tuning to ensure data integrity and efficiency.
- Develop and implement data security measures, including access controls and encryption, to protect sensitive information.
- Collaborate with data analysts and business teams to understand data requirements and ensure the database meets business needs.
- Troubleshoot and resolve database-related issues in a timely manner.
- Stay current with emerging technologies and advancements in Lakehouse architectures, specifically Apache Iceberg, to recommend and implement improvements to our data infrastructure.
- Document database architectures, procedures, and processes for internal use and compliance purposes.
What we look for
- Knowledge of SQL:
- Proven experience as a Database Administrator, with a strong preference for experience in managing data lakes and using Apache Iceberg.
- Deep understanding of database principles, architecture, and data modeling techniques.
- Python:. 1-3 years of general python experience
- Spark:
- Extensive experience with writing Spark SQL and working with DataFrames
- Experience with debugging Spark applications via metrics, history server, etc
- Understanding of shuffling and re-partitioning concepts
- Understanding of off-heap and on-heap memory usage in Spark
- nderstands joins in a distributed context; eg sort-merge vs broadcast joins (nice to have)