You will work with a team of skilled Site Reliability Engineers and help them to improve the application reliability. You will play a critical role in working with the reliability of the massive scale application that processes billions of events every day. You will collaborate with multiple stakeholders and help the team write useful automation that will reduce the toil and make the support process efficient.Responsibilities:Collaborate with SRE and Dev team and understand the requirements;Understand the application architecture and learn about the services;Implement the in-house product development & the automations to improve the overall reliability efforts;Propose solution architecture, technical specification for the solution;Analyze the production metrics and trends, generate the reliability report and share with the stakeholders;Perform peer reviews, peer programming, code reviews to understand the feature;Pair with the on-call support for critical incidents, performing troubleshooting and give the resolution, as needed;Participate in high volume production environment to learn about the issues and propose solutions to address them;Deploy application to production using CD pipelines, manage, and scale applications using Kubernetes, as needed.Minimum Requirements:Solid experience with Spinnaker is a must!Excellent verbal and written communication in English and collaboration skills;Strong problem-solving skills and the ability to learn new technologies quickly;Prior relevant experience working with enterprise applications and managing the automations;Preferred Qualifications:Hands on experience with coding with Python/Golang will be a plus;Experience with supporting Kubernetes and container orchestration;Knowledge of cloud, preferably GCP;Experience with monitoring and logging tools, with a strong emphasis on Grafana & Prometheus to create dashboards and alerts;
#J-18808-Ljbffr