Responsibilities:
End-to-end development of company data infrastructures
Build and develop high-performance, near real-time ETL processes incorporating Airflow, AWS, Kubernetes, Databricks, Spark, and Kafka
Drive the collection of new data and refine existing data sources
Develop, implement, and maintain change control and testing processes
Design and build different monitoring tools
Implement data pipelines and data architecture that is scalable, fault-tolerant, and supports high throughput and low latency, while considering security aspects at all times
Requirements:
Relevant certification in big data architecture in one or more of the public clouds (AWS Solution Architect / Big Data | GCP Data Engineer) – an advantage
3+ years of industry experience in a similar role – a must
2+ years of Python experience with the ability to understand other languages as well – a must
Experience working with and deploying distributed data technologies – Kafka, Airflow, Spark, Presto, AWS Glue, DBT, etc – a must
Advanced-level Spark and Kafka skills – an advantage
Deep understanding and experience with data lakes & data warehouses in the cloud (S3, Redshift, Cloud storage, Athena)
Proven experience working with SQL and NoSQL databases
Ability to build and deliver working software through iterative, agile processes
Experience working in a collaborative CI/CD software development environment, including Git, peer code review, and easily maintained, scalable, and documented code