Site Reliability Engineer Job in Blueshift Labs
Site Reliability Engineer
Blueshift Labs
4+ weeks ago
- Pune, Pune Division, Maharashtra
- Not Disclosed
- Full-time
- Permanent
Job Summary
Responsibilities
- On-call duties to provide application support, incident management, and troubleshooting
- Shift rotation timings to cover availability of SRE function 24x7
- Improve reliability and drive down the burden of toil with tooling and automation
- Analyze complex systems from a reliability, resilience, and performance perspective
- Identify sources of instability in large-scale distributed systems and drive operational excellence
- Hands on implementation and management of complex virtualized environments
- Implement scale-up / scale-down strategies based on various utilization metrics
- Author incident reports by coordinating with multiple engineering teams
- Identify and fill gaps in the monitoring & alerting system
- Periodic reporting of system status to the organization
Requirements
- 5+ years of relevant industry experience
- Prior hands-on experience with managing AWS and cloud infrastructure scaling to hundreds of nodes
- Experience with managing a container orchestration system
- Deep understanding of large scale data systems and data pipelines including managing NoSQL, SQL and HDFS/Hadoop clusters
- Experience with modern SRE practices & tools
- Hands-on experience with active incident management
- Willingness & ability to work in night shifts
Experience Required :
Minimum 5 Years
Vacancy :
2 - 4 Hires
Similar Jobs for you
×
Help us improve TheIndiaJobs
Need Help? Contact us