Primary responsibilities include, but are not limited to:
Design, implement, and maintain scalable data processing platforms used for real-time
analytics and exploratory data analysis.
Manage our various types of data from ingestion through ETL to storage and batch
processing.
Automate, test and harden all data workflows.
Architect logical and physical data models to ensure the needs of the business are met.
Collaborate with our analytics teams, while applying best practices.
Architect and develop systems and algorithms for distributed real-time analytics and
data processing.
Implement strategies for acquiring and transforming our data to develop new insights.
Champion data engineering best practices and institutionalizing efficient processes to
foster growth and innovation within the team.
What We’re Looking For
Minimum of 5 years of experience working with high volume data infrastructure.
Must have deep experience with both Databricks and AWS cloud computing platforms
Proficient programming in Python.
Proficiency with SQL and ability to optimize queries.
Experience with large-scale data processing using Spark and/or Presto/Trino.
Experience with a data transformation framework like dbt.
Experience with job orchestration tooling like Airflow.
Experience with data ingestion frameworks like Fivetran.
Experience in data modeling and database design.
Experience in modern development lifecycle including Agile methodology, CI/CD,
automated deployments using Terraform, GitHub Actions etc.
Knowledge and proficiency in the latest open source and data frameworks, modern
data platform tech stacks and tools.
Always learning and staying up to speed with the fast moving data world.
You have good communication skills and can work independently.
BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.