Senior Site Reliability Engineer (Remote)
Senior Site Reliability Engineer | Rocket.Chat | Brazil
This position is for applicants in Latin America.
As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat.
Your expertise in designing, implementing, and maintaining robust infrastructure will be instrumental in delivering exceptional user experiences.
Mandatory Hard Skills
Strong proficiency in Linux/Unix systems administration
Proficiency in scripting languages such as Python, Go, or Bash
In-depth knowledge of cloud platforms such as AWS, Azure, or GCP
Experience with containerization tools such as Docker and container orchestration platforms such as Kubernetes
Proficiency in monitoring tools such as Prometheus and Grafana for collecting, analyzing, and visualizing system metrics, logs, and events
Experience with CI/CD pipelines and tools such as ArgoCD
Solid understanding of networking fundamentals, including TCP/IP, DNS, DHCP, VLANs, routing, and firewalls
Familiarity with database technologies such as MySQL, PostgreSQL, MongoDB, or Redis
Desirable Hard Skills
Familiarity with agile management tools such as Jira
Knowledge of Javascript technology
Soft Skills
Collaboration with development teams to ensure that applications are designed with reliability and scalability in mind
Excellent problem-solving and troubleshooting skills
Effective communication and collaboration skills with both technical and non-technical stakeholders
Strong analytical skills to identify root causes of complex issues and develop effective solutions
Leadership skills to guide and inspire team members, especially during incidents or critical situations
Staying updated with emerging technologies and trends in the field is important for continuous learning
What You'll Do
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform
Automate deployment processes to achieve consistent and repeatable infrastructure provisioning
Configure and maintain CI/CD automation pipelines
Proficient in leveraging diverse data sources for troubleshooting, optimization, and ensuring system reliability
Continuously monitor and plan for capacity increases to accommodate traffic growth and ensure that the infrastructure remains fault-tolerant under varying load conditions
Take leadership and accountability in writing blameless post mortems
Lead teams in disaster recovery procedures
Well-versed in network security principles
Coordinate the efforts of responding teams efficiently during incidents
Proficient in at least one scripting or programming language (e.g., Go, Bash)
Proactively suggest and maintain documentation related to SRE processes
In-depth knowledge and hands-on experience with one or more major cloud providers
In-depth knowledge of container technology, including Docker and Kubernetes
Benefits
Our goal is to make your routine as a Rocketeer feel enjoyable, exciting, and comfortable in a 100% remote environment.
So, you'll receive a set of benefits to improve your remote work experience!
They include a flexible schedule, unlimited Paid Time Off, language and tech courses, stock options, a multicultural environment with colleagues in over 26 countries, a vibrant company culture, and more!
About Rocket.Chat
Rocket.Chat is the world's largest open-source communications platform.
Built for organizations needing more control over their communications, it enables collaboration between colleagues, partners, customers, communities, and platforms without compromising data ownership, customizations, or integrations.
Tens of millions of users in over 150 countries and organizations such as Deutsche Bahn, the U.S. Navy, and Credit Suisse trust Rocket.Chat every day to keep their communications completely private and secure.
As Rocket.Chat, we believe in reconnecting the world, one conversation at a time!
See yourself in that?
So apply now!
When applying state you found this job on Pangian.com Remote Network.
#J-18808-Ljbffr