Senior Site Reliability Engineer (Remote)Senior Site Reliability Engineer | Rocket.Chat | BrazilThis position is for applicants in Latin America.As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat. Your expertise in designing, implementing, and maintaining robust infrastructure will be instrumental in delivering exceptional user experiences.Mandatory Hard SkillsStrong proficiency in Linux/Unix systems administrationProficiency in scripting languages such as Python, Go, or BashIn-depth knowledge of cloud platforms such as AWS, Azure, or GCPExperience with containerization tools such as Docker and container orchestration platforms such as KubernetesProficiency in monitoring tools such as Prometheus and Grafana for collecting, analyzing, and visualizing system metrics, logs, and eventsExperience with CI/CD pipelines and tools such as ArgoCDSolid understanding of networking fundamentals, including TCP/IP, DNS, DHCP, VLANs, routing, and firewallsFamiliarity with database technologies such as MySQL, PostgreSQL, MongoDB, or RedisDesirable Hard SkillsFamiliarity with agile management tools such as JiraKnowledge of Javascript technologySoft SkillsCollaboration with development teams to ensure that applications are designed with reliability and scalability in mindExcellent problem-solving and troubleshooting skillsEffective communication and collaboration skills with both technical and non-technical stakeholdersStrong analytical skills to identify root causes of complex issues and develop effective solutionsLeadership skills to guide and inspire team members, especially during incidents or critical situationsStaying updated with emerging technologies and trends in the field is important for continuous learningWhat You'll DoDevelop and maintain Infrastructure as Code (IaC) using tools like TerraformAutomate deployment processes to achieve consistent and repeatable infrastructure provisioningConfigure and maintain CI/CD automation pipelinesProficient in leveraging diverse data sources for troubleshooting, optimization, and ensuring system reliabilityContinuously monitor and plan for capacity increases to accommodate traffic growth and ensure that the infrastructure remains fault-tolerant under varying load conditionsTake leadership and accountability in writing blameless post mortemsLead teams in disaster recovery proceduresWell-versed in network security principlesCoordinate the efforts of responding teams efficiently during incidentsProficient in at least one scripting or programming language (e.g., Go, Bash)Proactively suggest and maintain documentation related to SRE processesIn-depth knowledge and hands-on experience with one or more major cloud providersIn-depth knowledge of container technology, including Docker and KubernetesBenefitsOur goal is to make your routine as a Rocketeer feel enjoyable, exciting, and comfortable in a 100% remote environment. So, you'll receive a set of benefits to improve your remote work experience! They include a flexible schedule, unlimited Paid Time Off, language and tech courses, stock options, a multicultural environment with colleagues in over 26 countries, a vibrant company culture, and more!About Rocket.ChatRocket.Chat is the world's largest open-source communications platform. Built for organizations needing more control over their communications, it enables collaboration between colleagues, partners, customers, communities, and platforms without compromising data ownership, customizations, or integrations.Tens of millions of users in over 150 countries and organizations such as Deutsche Bahn, the U.S. Navy, and Credit Suisse trust Rocket.Chat every day to keep their communications completely private and secure. As Rocket.Chat, we believe in reconnecting the world, one conversation at a time! See yourself in that? So apply now!When applying state you found this job on Pangian.com Remote Network.
#J-18808-Ljbffr