Senior Manager( Production Operations And Sre) Job in Mandiant
Senior Manager( Production Operations And Sre)
- Bengaluru, Bangalore Urban, Karnataka
- Not Disclosed
- Full-time
- Permanent
Company Description
Since 2004, Mandiant has been a trusted partner to security-conscious organizations. Effective security is based on the right combination of expertise, intelligence, and adaptive technology, and the Mandiant Advantage SaaS platform scales decades of frontline experience and industry-leading threat intelligence to deliver a range of dynamic cyber defense solutions. Mandiant s approach helps organizations develop more effective and efficient cyber security programs and instills confidence in their readiness to defend against and respond to cyber threats.
Role Description
Reporting to the IT Operations Director in our Enterprise Technology Services team the Production Operations and Site Reliability Engineering leadership role will lead a globally distributed high-performing team focused on elevating application and service performance and availability in support of our organization s fast-evolving enterprise technology needs.
Reduce risk to service availability for employees and customers by partnering with Engineering and Operations teams to proactively pivot to AI-driven telemetry tooling, leading a team of professionals highly focused on improving transaction resilience.
The ideal candidate will have a broad background spanning both applications and infrastructure. They will have direct experience in multiple coding languages and have performed code reviews in their previous work. They will have been a service reliability engineer and shifted into management. They will have a strong sense of urgency with respect to outages and delegate appropriately to ensure 24/7 coverage for their function. This breadth of experience will be leveraged to mature Mandiant s SRE teams and processes.
DUTIES & RESPONSIBILITIES
Identify opportunities for improving telemetry, observability, service availability and transaction resilience
Identify and build self-healing capabilities leveraging APIs, scripting, and coding
Streamline and optimize tooling and process from risk detection to remediation
Drive accountability for risk reduction and corrective actions following Post-mortem and RCA reviews
Ownership and accountability for Major Incident and Problem Management processes and execution
Partner with sustaining engineering teams (both internal and external) to improve service performance and stability
Drive active partnership with mergers and acquisitions integration teams to ensure that goals are achieved for service availability objectives
Drive architectural reviews for critical services, raising the bar for service availability and transaction resilience
Track Product Engineering roadmaps to identify forward-looking telemetry and automation opportunities
Build and sustain enterprise-level service offering such as Logging as a Service, Telemetry as a Service, Service Availability reporting and dashboard publication
Establish and govern standards for New Service Introduction
Vendor Relationship Management for Telemetry Tooling solution providers
CMDB population, governance, and maintenance
Staffing, motivation, and evolution of globally diverse high-performing team
Fresher
2 - 4 Hires