As a Senior Data Engineer (Python/Spark) , you will play a crucial role in building and maintaining robust data ingestion pipelines, applications, and data processing systems.
You will work on cutting-edge technologies and be responsible for designing, implementing, and optimizing solutions that support our Data Product Hub.
Key Responsibilities: Develop and Maintain Data Pipelines: Design, develop, and maintain data ingestion pipelines using PySpark on AWS EMR, ensuring efficient data processing and transformation.
Python Application Development: Build and enhance Python applications deployed on AWS services such as Lambda, Step Functions, EC2, and S3, using Terraform for infrastructure as code.
Spark Application Tuning: Optimize Spark applications for performance, ensuring they run efficiently and handle large-scale data processing tasks.
AWS Infrastructure Management: Configure and manage AWS resources, including IAM, S3, EMR, and EC2, to support data processing and application deployment.
Data Processing: Work with data stored in Snowflake and other sources, pulling from existing S3 and SFTP buckets, and process data in batch mode using CSV or text files.
Collaboration: Collaborate with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
Documentation and Best Practices: Document processes, maintain best practices, and ensure the security and integrity of data.
Qualifications: Education: Bachelor's degree in Computer Science, Information Technology, or a related field.
Master's degree is a plus.
Experience: 5+ years of experience in data engineering, with a strong focus on Python and PySpark.
Technical Skills: Proficiency in Python development and scripting.
Hands-on experience with PySpark and Spark application tuning.
Strong knowledge of AWS services, including EMR, Lambda, Step Functions, EC2, S3, and IAM.
Experience with Terraform for infrastructure as code.
Familiarity with data warehousing solutions, particularly Snowflake.
Understanding of data ingestion and ETL processes.
Soft Skills: Excellent problem-solving skills, strong communication abilities, and a collaborative mindset.