Manager, Site Reliability Engineering (hybrid) Job in Kaplan, Inc.
Manager, Site Reliability Engineering (hybrid)
- Bengaluru, Bangalore Urban, Karnataka
- Not Disclosed
- Full-time
- Permanent
Job Title Manager, Site Reliability Engineering (Hybrid) Job Description For more than 80 years, Kaplan has been a trailblazer in education and professional advancement. We are a global company at the intersection of education and technology, focused on collaboration, innovation, and creativity to deliver a best in class educational experience and make Kaplan a great place to work. Our offices in India opened in Bengaluru in 2018. Since then, our team has fueled growth and innovation across the organization, impacting students worldwide. We are eager to grow and expand with skilled professionals like you who use their talent to build solutions, enable effective learning, and improve students lives. The future of education is here and we are eager to work alongside those who want to make a positive impact and inspire change in the world around them. The SRE manager is primarily responsible for ensuring the balance between the reactive work as part of incident management, and the proactive work of reducing future issues while educating and informing the engineering organization about patterns and behaviors that lead to more reliable and available systems. Additionally, the qualified candidate will deliver insights from massive-scale data in real-time, bringing fresh ideas and demonstrating unique and informed viewpoints, and developing real-world solutions and positive user experiences at every interaction. Primary/Key Responsibilities Provide technical and people leadership to the Site Reliability Engineering teams by facilitating one-one-one, team, and performance review meetings. Increase reliability through establishing guidance and methods of improvement. Communicate, discuss, and champion reliability efforts. Improve incident management lifecycle to identify, mitigate, and learn from reliability risks. Develop deeper insights and analysis into the quality of experience for our customers. Drive incidents to resolution by coordinating with multiple engineering teams. Lead cross-organizational efforts with different teams to diagnose operational surprises and carry forward improvements. Identify sources of instability in large-scale distributed systems and drive operational excellence. Understanding the near, mid, and long term needs of the business and how the work of the team contributes. Hybrid Schedule: 3 days remote / 2 days in office 30-day notification period preferred Minimum Qualifications Bachelors in Computer Information Systems or related field 5+ years of related experience Proven experience as a Site Reliability Engineer or similar role. Cloud, SaaS, and virtualization concepts and performance concerns. Operating system design , processes, and threading models. Defining and monitoring system quality measures, including SLO and SLA. Tooling to improve reliability of systems, automated remediation of issues, or improve scalability. PowerShell, SQL queries, Elasticsearch, Redis, and/or Memcached. Containers and container orchestration tools (Kubernetes experience preferred). Preferred Qualifications Experience leading high performing engineering teams. Understanding of Infrastructure as Code (IaC) tools such as Terraform and CloudFormation.
Fresher
2 - 4 Hires