Senior Manager( Production Operations And Sre) Job in Mandiant

Senior Manager( Production Operations And Sre)

Apply Now
Job Summary

Company Description

Since 2004, Mandiant has been a trusted partner to security-conscious organizations. Effective security is based on the right combination of expertise, intelligence, and adaptive technology, and the Mandiant Advantage SaaS platform scales decades of frontline experience and industry-leading threat intelligence to deliver a range of dynamic cyber defense solutions. Mandiant s approach helps organizations develop more effective and efficient cyber security programs and instills confidence in their readiness to defend against and respond to cyber threats.

Role Description

Reporting to the IT Operations Director in our Enterprise Technology Services team the Production Operations and Site Reliability Engineering leadership role will lead a globally distributed high-performing team focused on elevating application and service performance and availability in support of our organization s fast-evolving enterprise technology needs.

Reduce risk to service availability for employees and customers by partnering with Engineering and Operations teams to proactively pivot to AI-driven telemetry tooling, leading a team of professionals highly focused on improving transaction resilience.

The ideal candidate will have a broad background spanning both applications and infrastructure. They will have direct experience in multiple coding languages and have performed code reviews in their previous work. They will have been a service reliability engineer and shifted into management. They will have a strong sense of urgency with respect to outages and delegate appropriately to ensure 24/7 coverage for their function. This breadth of experience will be leveraged to mature Mandiant s SRE teams and processes.

DUTIES & RESPONSIBILITIES

Identify opportunities for improving telemetry, observability, service availability and transaction resilience

Identify and build self-healing capabilities leveraging APIs, scripting, and coding

Streamline and optimize tooling and process from risk detection to remediation

Drive accountability for risk reduction and corrective actions following Post-mortem and RCA reviews

Ownership and accountability for Major Incident and Problem Management processes and execution

Partner with sustaining engineering teams (both internal and external) to improve service performance and stability

Drive active partnership with mergers and acquisitions integration teams to ensure that goals are achieved for service availability objectives

Drive architectural reviews for critical services, raising the bar for service availability and transaction resilience

Track Product Engineering roadmaps to identify forward-looking telemetry and automation opportunities

Build and sustain enterprise-level service offering such as Logging as a Service, Telemetry as a Service, Service Availability reporting and dashboard publication

Establish and govern standards for New Service Introduction

Vendor Relationship Management for Telemetry Tooling solution providers

CMDB population, governance, and maintenance

Staffing, motivation, and evolution of globally diverse high-performing team

Experience Required :

Fresher

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs