What you will do
Design a clear and lean data model that outlines data sources and transformations over this data using DAGs and data orchestration tools like Dagster or Airflow.
Conduct data validation and testing on each DAG step.
Insights Layer Ownership: Build data models and algorithms to generate first-party data using statistical and machine learning techniques, including LLMs and natural language processing.
Generate derived insights and determine accurate values from error-prone sources (e.g., headcount information).
Data Product Development: Develop and enhance data products to improve the discoverability of meaningful knowledge and information in our database.
Continuously improve similarity, relevance, normalization, and tagging algorithms that power our search engine.
Pipeline Maintenance: Oversee the maintenance and health of data pipelines to ensure accurate, efficient, and optimal data transformations by avoiding repetitive tasks or operations.
Team Collaboration: Collaborate with the team to devise product goals, outline milestones, and execute plans with minimal guidance.
Data Warehouse Design: Contribute to the design of a robust data warehouse architecture by following best practices and industry standards.
Transfer data from S3, load data with different schedules, and manage various data pipelines on top of a unique warehouse architecture.
Collaborate with our platform team to make design decisions on the optimal middle-layer database flow, improving DAG execution times and costs.
Must haves
4+ years of experience as a Data Engineer.
Programming Languages: Python, SQL.
Orchestration Tools: Airflow, Dagster.
Data Warehouses: Snowflake, Databricks.
ETL Tools: DBT Models.
Containerization: Docker.
DevOps: AWS.
Databases: Clickhouse, Postgres, DuckDB.
Upper-intermediate English level.
#J-18808-Ljbffr