What you will do
Respond quickly to incidents, troubleshoot networking and DNS issues, and help mitigate risks during holidays, vacations, or sick days when other team members may be unavailable;
Help in testing the platform and automation that will replace many of the manual tasks they are taking on;
Own availability, performance, and growth of Indeed's Cloud Infrastructure;
Consulting with stakeholders to specify requirements and solutions that address business challenges and opportunities;
Developing and maintaining business continuity and disaster recovery processes;
Serve as a subject matter expert for Indeed's cloud infrastructure implementation, performing design reviews and consulting with internal teams to ensure implementation best practices;
Build out monitoring tools and scripts to ensure your vertical is performing well and meeting SLOs with users;
Overseeing maintenance and configuration of our Cloud WAF solutions;
Serving in on-call rotation for cloud infrastructure specialty;
Create forecasting models for capacity planning, providing proactive growth for Indeed's infrastructure;
Ensuring all cloud security measures are incorporated into infrastructure implementation;
Ensuring proper infrastructure resilience and proper inventory management and tagging;
Backup and Recovery design and implementation;
Building compliance, governance, and oversight;
Working hours will be Tokyo Time zone.
Must haves
+3 years of experience with
DevOps
methodologies and CI/CD pipelines to ensure smooth deployment of networking and automation changes;
Experience with
Terraform
for automating
AWS
infrastructure provisioning, as well as YAML for configuration management;
Proficiency in
Python
for developing scripts and automation tools related to network and DNS management;
Ability to automate manual
network
configurations, streamline requests, and create scalable solutions;
Upper-intermediate English level.
Nice to haves
Knowledge of version control tools like Git for managing infrastructure code;
Proficiency in GitOps workflows using both Argo CD and Flux2 for automating application deployments and rollbacks;
Familiarity with monitoring tools (CloudWatch, Datadog, etc.)
to detect and resolve incidents before they impact production services;
Knowledge of additional AWS services such as EC2, Lambda, S3, and CloudFormation, which might intersect with networking or DNS tasks;
In-depth knowledge of AWS Networking services such as VPCs, Transit Gateways, CloudWAN, VPC Peering, Direct Connect, and security groups;
Hands-on experience with Amazon EKS for managing Kubernetes clusters in AWS;
Proficiency in containerization technologies like Docker.
#J-18808-Ljbffr