Leave us your email address and we'll send you all the new jobs according to your preferences.
Senior Director - Operations and Reliability Engineering
Posted 8 hours 40 minutes ago by Recruitics
Locations: Canary Wharf Boston
Who We Are
Boston Consulting Group partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. BCG was the pioneer in business strategy when it was founded in 1963. Today, we help clients with total transformation-inspiring complex change, enabling organizations to grow, building competitive advantage, and driving bottom-line impact.
To succeed, organizations must blend digital and human capabilities. Our diverse, global teams bring deep industry and functional expertise and a range of perspectives to spark change. BCG delivers solutions through leading-edge management consulting along with technology and design, corporate and digital ventures-and business purpose. We work in a uniquely collaborative model across the firm and throughout all levels of the client organization, generating results that allow our clients to thrive.
What You'll Do
The Senior Director - Operations and Reliability Engineering is responsible for blending Site Reliability Engineering (SRE), DevOps, and traditional operations models to build a next-generation Reliability Engineering function. This role ensures end-to-end automation at scale, 24x7 operational excellence, and high availability across all of BCG, including BCG Core, BCG X, and Consulting Team (CT) worldwide. The leader will drive strategic planning, execution, and optimization of global IT infrastructure, cloud operations, and service management while ensuring a secure, scalable, and efficient technology environment. This role is accountable for embedding and assuring IT Service Management (ITSM) processes across all teams, ensuring compliance with standardized frameworks and operational excellence.
Key Responsibilities:
Strategic Leadership & Transformation:
- Define and execute a modern Reliability Engineering strategy, integrating SRE, DevOps, and automation-first operational models.
- Drive end-to-end automation to eliminate toil, improve efficiency, and enhance operational resilience.
- Lead the transition from traditional IT operations to a proactive, AI-driven, self-healing infrastructure.
- Establish a global observability, telemetry, and predictive analytics framework for real-time insights.
- Align operational strategies with business goals, ensuring IT supports digital transformation initiatives across BCG Core, BCG X, and CT.
Infrastructure & Cloud Operations:
- Oversee global IT infrastructure, cloud platforms, and hybrid hosting environments across all BCG business units.
- Manage network reliability, compute platforms, and cloud-native services across AWS, Azure, and GCP.
- Scale Infrastructure as Code (IaC), automated provisioning, and cloud workload optimization.
- Drive edge computing, containerized workloads, and high-performance computing strategies.
- Implement AI-driven monitoring, self-healing automation, and full-stack observability.
IT Service Management & Operational Excellence:
- Mandate and assure the adoption of IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery.
- Establish SRE-based operational metrics, including SLOs, SLIs, and error budgets.
- Oversee incident response, problem resolution, and root cause analysis with AI-driven remediation.
- Ensure high availability, performance, and security compliance for all enterprise services.
- Develop a follow-the-sun operational support model, ensuring 24x7 resilience and uptime across all of BCG.
- Optimize incident, change, and capacity management, ensuring alignment with ITIL best practices and automated workflows.
- Lead Service Asset and Configuration Management (SACM), ensuring accurate and real-time management of software and IT assets within the CMDB.
- Drive continuous enhancements to the CMDB, improving visibility, compliance, and lifecycle management of IT assets.
Security, Compliance & Risk Management:
- Embed security and compliance into operational workflows with automated security controls.
- Ensure adherence to ISO 27001, NIST, SOC 2, GDPR, and cloud security best practices.
- Collaborate with cybersecurity teams to integrate zero-trust security models.
- Drive resiliency planning, disaster recovery, and business continuity initiatives.
Financial & Vendor Management:
- Optimize IT operational budgets with a cost-effective, cloud-native strategy.
- Negotiate vendor contracts, ensuring alignment with business needs and service reliability.
- Drive cost efficiency in cloud spending, SaaS platforms, and infrastructure investments.
Leadership & Talent Development:
- Build and mentor a high-performing Reliability Engineering team, fostering a culture of automation and innovation.
- Lead a team of SREs, DevOps engineers, and platform reliability experts across global squads.
- Promote a collaborative, data-driven, and proactive mindset, ensuring agility and operational resilience.
- Establish workforce development programs for AI-driven operations, automation, and modern reliability practices.
What You'll Bring
Required Qualifications:
- 15+ years of experience in IT operations, SRE, DevOps, or platform engineering.
- 5+ years in a senior leadership role, managing large-scale IT environments.
- Deep technical expertise in cloud computing (AWS, Azure, GCP), on-prem infrastructure, and hybrid environments.
- Proven track record in end-to-end automation, Infrastructure as Code (IaC), and large-scale observability.
- Experience in AI-driven IT operations, predictive analytics, and automated remediation.
- Strong understanding of zero-trust security, regulatory compliance, and risk management.
- Excellent leadership, communication, and stakeholder management skills.
Preferred Qualifications:
- Certifications: ITIL, AWS/Azure/GCP Solutions Architect, SRE Foundation, CISSP, or equivalent.
- Experience with Kubernetes, Terraform, Ansible, and AI-powered operations tools.
- Strong problem-solving abilities, with a data-driven approach to operational excellence.
The Senior Director - Operations Platform Lead is a pivotal leadership role responsible for shaping the future of IT operations by integrating SRE, DevOps, and automation-first methodologies. If you are a highly technical, innovation-driven leader passionate about scaling operations through automation and AI-driven resilience, we invite you to apply.
Who You'll Work With
Work Environment & Additional Information:
- Hybrid or on-site work model.
- May require occasional travel for business meetings, data center visits, or vendor engagements.
- Ability to work in a fast-paced, high-availability IT environment, with a focus on automation and reliability.
Boston Consulting Group is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, age, religion, sex, sexual orientation, gender identity / expression, national origin, disability, protected veteran status, or any other characteristic protected under national, provincial, or local law, where applicable, and those with criminal histories will be considered in a manner consistent with applicable state and local laws.
BCG is an E - Verify Employer. Click here for more information on E-Verify.
Recruitics
Related Jobs
Associate Dentist
- Surrey, Leatherhead, United Kingdom, KT22 8AW
Associate Dentist
- Leicestershire, Coalville, United Kingdom, LE67 3XF
Associate Dentist
- Bedfordshire, Bedford, United Kingdom, MK40 3HS
Day Concierge
- London, Southwark, United Kingdom, SE1 2LZ
Reservationist
- Cambridgeshire, Cambridge, United Kingdom, CB4 0AE