What you will do Design, develop, and maintain ETL pipelines to extract, transform, and load data across various data sources (cloud storage, databases, APIs);Use Apache Airflow for orchestrating workflows, scheduling tasks, and managing pipeline dependencies;Build and manage data pipelines on Azure and GCP clouds;Design and support Data Lake;Write Python scripts for data cleansing, transformation, and enrichment using libraries like Pandas, PySpark;Analyze logs and metrics from Airflow and cloud services to resolve pipeline failures or inefficiencies.Must haves Strong experience ( 4+ years ) writing efficient and scalable Python code , especially for data manipulation and ETL tasks (using libraries like Pandas, PySpark, Dask, etc .
);Knowledge of Apache Airflow for orchestrating ETL workflows , managing task dependencies, scheduling, and error handling;Experience in building, optimizing, and maintaining ETL pipelines for large datasets, focusing on data extraction, transformation, and loading;Familiarity with cloud-native storage solutions;Understanding and working experience with different file formats;Expertise in writing efficient SQL queries for data extraction, transformation, and analysis.
Familiarity with complex SQL operations (joins, aggregations, window functions, etc.
);Familiarity with IAM (Identity and Access Management), data encryption, and securing cloud resources and data storage on both Azure, GCP;Upper-intermediate English level.Nice to haves Use some Java libraries to request data from APIs;Knowledge of data governance practices, and the implementation of data lineage and metadata management in cloud environments.#J-18808-Ljbffr