Imagen institucional
Imagen institucional

Site Reliability Engineer

Ciudad de México, Ciudad de Mexico (CDMX), Mexico

Oficios y Otros/Traduccion

No especificado
Remoto

Hace 24 días

Postularse

Hace 24 días

Ciudad de México, Ciudad de Mexico (CDMX), Mexico

Oficios y Otros/Traduccion

No especificado
Remoto

Hace 24 días

Postularse
Descripción del puesto

We are looking for a highly skilled Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and efficiency of our platforms and services. The ideal candidate will have extensive hands-on experience in Kubernetes, cloud platforms, infrastructure automation, and observability, while also bringing an analytical mindset and passion for solving complex business problems through technology.

Candidates must be based in Mexico.

Responsibilities:

  • Design, build, and maintain reliable, scalable, and secure infrastructure across public cloud environments (GCP, AWS, Azure).
  • Manage and optimize Kubernetes clusters and service mesh technologies (Istio/Envoy), ensuring high availability and efficient workload distribution.
  • Implement and maintain Infrastructure as Code (Terraform, Pulumi, etc.) to enable automation and consistency across environments.
  • Partner with development and operations teams to define, design, and implement CI/CD pipelines using tools such as Argo, Jenkins, Spinnaker, and Buildkite.
  • Establish and improve observability practices by configuring monitoring, alerting, and logging solutions (Prometheus, Grafana, Datadog, Cloudwatch, Elastic, Splunk, etc.).
  • Troubleshoot networking, system, and application issues, ensuring optimal system performance and minimal downtime.
  • Develop strategies for workload scaling (HPA, VPA, capacity planning) to guarantee system stability and cost optimization.
  • Drive incident response processes, including root cause analysis and post-mortems, with a focus on continuous improvement.
  • Ensure compliance with security best practices and contribute to policies for admission control and workload governance (OPA, Kyverno).
  • Collaborate in an agile environment, working closely with cross-functional teams to design solutions that meet business requirements.
  • Advocate for SRE best practices, including automation, resilience, scalability, and operational excellence across teams.
  • Continuously research and evaluate emerging technologies, incorporating innovative approaches to improve infrastructure reliability.

Requisitos

Requirements:

  • 7+ years of experience in platform Engineering/SRE roles using an object oriented language (Python, Golang, etc).
  • Bachelor’s degree in Computer Science, Computer Engineering or equivalent combination of education and experience.
  • Extensive experience working with Kubernetes in a public cloud (GKE, EKS, AKS, etc).
  • Experience working with Istio/Service Mesh.
  • Experience working with IaC (Terraform, Pulumi, etc).
  • Experience working within a Public Cloud environment (GCP, AWS, Azure, etc).
  • Experience working with CI/CD tools such as Argo, Buildkite, TravisCI, Jenkins, Spinnaker, etc.
  • Experience working with platform observability tools (Prometheus, Thanos, Grafana, Fluentbit, Cloud Monitoring, Google Cloud Logging, Datadog, Pagerduty, Cloudwatch, Kibana, Elastic Search, Splunk, VictorOps, etc).
  • Experience with Networking.
  • Experience and desire to work in an agile environment.
  • Analytical mindset and passion for solving business problems with technology.

Nice to have:

  • Experience working with Dev Testing tools and patterns such as Garden, Flagger, Canary Deployments, Blue/Green Testing, A/B Testing.
  • Experience setting up and working with Kubernetes Admission Control (Kyverno, OPA, etc).
  • Experience working with workload scaling (HPA, VPA, Capacity Planning/Reservations, etc).

Nosotros

Founded in 2005, tbo. is a global organization that provides translation, talent, training, teams and testing services to a full range of clients in over 40 countries worldwide, from startups to enterprise-level companies.


tbo. aims to facilitate global communication by bridging the gap between peoples and cultures, providing simple solutions to complex problems, and outstanding service in 100+ languages.


tbo. fosters a culture of continuous improvement, creativity, sustainability and community, with a longstanding commitment to providing high-touch human service.


tbo. It is ranked as one of the fifteen fastest organically growing localization companies in the world and operates 24/7, 363 days a year on a “follow the sun” format via offices in Cordoba, Ho Chi Minh City, Kyiv and Lima.
Certified under five separate international quality norms.

Join our growing staff and boost your career in a global organization!

At tbo., we believe that fostering an inclusive culture and a diverse environment makes us stronger. We are an equal opportunity employer, dedicated to creating a space where everyone can thrive and grow. We are committed to ensuring our hiring processes are fair, transparent, and in compliance with all legal and policy requirements, promoting a workplace free from discrimination.

Powered by Logo