Senior Site Reliability Engineering

Posted 12 hours 15 minutes ago by Randstad Technologies

£34 - £55 Hourly
Contract
Not Specified
Other
London, United Kingdom
Job Description

Job Title: Site Reliability Engineer
Location: Remote (UK)
Type: Full-Time (1-Year Contract)
Working Hours: 11 AM - 7 PM

Are you passionate about building and managing reliable, large-scale cloud systems? We're looking for a Senior Site Reliability Engineer to join a high-performing Observability team. In this role, you'll play a critical part in ensuring our cloud services remain performant and scalable, supporting billions of daily requests.

Key Responsibilities

  • Scale and optimize Prometheus architecture to manage millions of active metrics.
  • Operate and maintain large ElasticSearch clusters (2000TB+).
  • Build and manage high-throughput Kafka pipelines processing hundreds of thousands of events per second.
  • Develop self-service APIs, robust alerting systems, and deploy infrastructure with Terraform.
  • Support observability initiatives to monitor and improve critical cloud services.

What We're Looking For

  • 5+ years of experience managing distributed systems on Linux (Debian/Ubuntu preferred).
  • 2+ years of development experience with Ruby, Python, Go, or similar languages.
  • Expertise in technologies such as ElasticSearch, Kafka, Prometheus, Terraform, Ansible, and more.
  • A strong passion for solving complex challenges in large-scale distributed systems.
  • A proactive, curious mindset with a focus on quality and customer experience.

This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV

Randstad Technologies is acting as an Employment Business in relation to this vacancy.