Leave us your email address and we'll send you all the new jobs according to your preferences.

Lead/Senior SRE - Permanent - Fully Onsite - London - £90,000pa

Posted 3 hours 23 minutes ago by Robson Bale Ltd

£90,000 Annual
Permanent
Not Specified
Other
London, United Kingdom
Job Description

Lead/Senior SRE - Permanent - Fully Onsite - London - £90,000pa

Responsible to perform end to end Self-Healing automation solution to reduce manual effort/TOIL.

Primary Skill -Ansible, Terraform, Python, DevOps, SRE, Dockers, AWS (Atlas), ECS Based internal tooling

Secondary Skill -Shell Script, Linux, Monitoring tools - Datadog, Splunk, Dynatrace, Grafana, Thousand Eyes, Gremlin etc.

  • Experience with Automation principals and tools (Ansible etc.).should have worked with Toil identification and quality of life automation.
  • Advanced working experience with two or more of the following: Unix/Linux, Windows Server, Oracle, MSSQL, MongoDB.
  • Experience with Python, Java, Curl Scripting or any other types of Scripting.
  • Experience with JIRA, Confluence, BitBucket, GitHUB, Jenkins, Jules, Terraform.
  • Experience with two or more of the following observability tools: AppDynamics, Geneos, Dyanatrace, ECS Based internal tooling, Datadog, Cloud watch, Big Panda, Elastic Search (ELK), Google Cloud Logging, Grafana, Prometheus, Splunk, Thousand Eyes etc
  • Experience in creating Dashboard for Infra/APM/E2E workflows.
  • Monitoring, logging, Alerting and Error budget (99.9, 99.99, 99.999 %) for software, Operations & Business.
  • Define SLO, SLI, SLA with business/operations/Engineering team
  • Experience with logging, monitoring, and event detection on Cloud or Distributed platforms.
  • Experience creating and modifying technical documentation such as environment flow, functional requirements, nonfunctional requirements.
  • Effective production management - Incident & change Management, Production control, ITSM, Service Now, problem solving and analytical skills with ability to turn findings into strategic imperatives.
  • Technical operations application support and stability, realiability and resiliency experience.
  • Hands-on experience into SRE implementation of monitoring system- Dashboards development for application reliability using Splunk, Dynatrace, Grafana, App Dynamics, Datadog, Big panda.
  • Experience working on Configuration as Code, Infrastructure as code, AWS(Altas)
  • Provides technical direction regarding monitoring and logging to less experienced staff or develops highly complex original solutions. Acts as an Expert technical resource for modelling, simulation and analysis efforts.
  • Overall, we are looking for an Automation Engineer, who could reduce the toil issues and enhance the system towards reliability and scalability.

Nature of the Job:

  • Collaborate with Production support team, identify the existing manual activities, and automate.
  • Identify toil area where it can be automated to avoid manual intervention
  • Build Monitoring system and observability platform for more Stack traces and alerts and Dashboards.
  • Ability to define SLA, SLO and SLI and implement the same for better monitoring
  • Scalability, reliability, and observability are the primary goals for reduction of MTTD and MTTR.
Email this Job