Job Description AgileEngine is one of the Inc. 5000 fastest-growing companies in the U and a top-3 ranked dev shop according to Clutch.
We create award-winning custom software solutions that help companies across 15+ industries change the lives of millions If you like a challenging environment where you're working with the best and are encouraged to learn and experiment every day, there's no better place - guaranteed!
:) What you will do Design, develop, and maintain ETL pipelines to extract, transform, and load data across various data sources (cloud storage, databases, APIs); Use Apache Airflow for orchestrating workflows, scheduling tasks, and managing pipeline dependencies; Build and manage data pipelines on Azure and GCP clouds; Design and support Data Lake; Write Python scripts for data cleansing, transformation, and enrichment using libraries like Pandas, PySpark; Analyze logs and metrics from Airflow and cloud services to resolve pipeline failures or inefficiencies.
Must haves Strong experience (4+ years ) writing efficient and scalable Python code , especially for data manipulation and ETL tasks (using libraries like Pandas, PySpark, Dask, etc .
); Knowledge of Apache Airflow for orchestrating ETL workflows , managing task dependencies, scheduling, and error handling; Experience in building, optimizing, and maintaining ETL pipelines for large datasets, focusing on data extraction, transformation, and loading; Familiarity with cloud-native storage solutions; Understanding and working experience with different file formats; Expertise in writing efficient SQL queries for data extraction, transformation, and analysis.
Familiarity with complex SQL operations (joins, aggregations, window functions, etc.
); Familiarity with IAM (Identity and Access Management), data encryption, and securing cloud resources and data storage on both Azure, GCP; Upper-intermediate English level.
Nice to haves Use some Java libraries to request data from APIs; Knowledge of data governance practices, and the implementation of data lineage and metadata management in cloud environments.
The benefits of joining us Professional growth Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
Competitive compensation We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
A selection of exciting projects Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
Flextime Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Requirements Strong experience (4+ years) writing efficient and scalable Python code, especially for data manipulation and ETL tasks (using libraries like Pandas, PySpark, Dask, etc.
); Knowledge of Apache Airflow for orchestrating ETL workflows, managing task dependencies, scheduling, and error handling; Experience in building, optimizing, and maintaining ETL pipelines for large datasets, focusing on data extraction, transformation, and loading; Familiarity with cloud-native storage solutions; Understanding and working experience with different file formats; Expertise in writing efficient SQL queries for data extraction, transformation, and analysis.
Familiarity with complex SQL operations (joins, aggregations, window functions, etc.
); Familiarity with IAM (Identity and Access Management), data encryption, and securing cloud resources and data storage on both Azure, GCP; Upper-intermediate English level.