Azure Senior Site Reliability Engineer Job in Emtec
Azure Senior Site Reliability Engineer
- Pune, Pune Division, Maharashtra
- Not Disclosed
- Full-time
The Senior Site Reliability Engineer is responsible for providing continuous feedback of site health, reliability, availability and user experience for a specific domain. This is a matrixed role where the SRE will work closely on a day-to-day basis with the product team while reporting to the practice lead.
This role is expected to understand the product in depth, collect and analyze meaningful measurements and provide feedback to the business, Software Engineering and Product teams. The SRE will work very closely with the key stakeholders to help drive changes to increase customer satisfaction, product availability, reliability, and the completion of strategic technical initiatives.
In addition to monitoring and integration with the observability platform, a heavy focus will be placed on automation opportunities and automating operational processes to maintain 99.9% availability of the product. These efforts are in addition to Production SaaS Operational and Support responsibilities to quickly respond to and resolve production incidents, prevention of service disruption, and continuously improving the MTTR.
- 3 plus years of experience in a Site Reliability Engineering or Software Engineering role.
- Bachelor s degree in Computer Science, Information Technology or equivalent experience plus certifications
- 2-3 years of working experience with Windows Server OS administration
- 3-4 years of working experience with SQL Server and Entity Framework ORM
- 2-3 years of working experience writing and tuning SQL queries
- 2+ years of working experience with IIS configuration and scalability
- 2+ years of working experience of VMWare VSphere
- 2-3 years of working experience with ASP.Net MVC
- Familiarity with RESTful API
- Working knowledge and experience C#, Javascript, and HTML
- Experience with Azure/AWS would be a plus
- 2 years of working experience with Dynatrace, Azure monitor, AppInsight, log analytics
- Strong understanding of web hosting infrastructure and high availability architecture
- Experience measuring and monitoring .NET applications, SQL Servers/Database, and Serverless cloud resources or equivalent Java-based experience
- PowerShell or Linux scripting for creating automated routines for ensuring site availability
- Development/coding experience and skills for writing custom automation solutions
- Knowledge and skills surrounding Public Cloud architectures (Azure experience highly desired)
- Windows Performance Monitoring and Network trace analysis
Key Responsibilities:
- Performs application specific production support, incident management, problem management, RCAs, and service restoration as needed to quickly respond to and resolve production issues.
- Free up the developer resources to focus on developing new features in the product by handling most of the relevant aspects of how to operate the products effectively and proactively manager customer experience.
- Plan and achieve high availability, performance, and availability of the product service.
- Ensure pro-active monitoring of all core services and processes to prevent un-planned service disruption.
- Implement self-healing and scalability of technical services to avoid un-planned disruptions.
- Establish observability of the business system health by integrating with the observability platform using automation
- Maintains operations runbook for during business hour and off-hours system support.
- Partners with the engineering to ensure successful change management from development to delivery.
- Implements and trains team members on the tool consolidation strategy to optimize spend versus value for our end to end monitoring platform.
- Contributes to definition of strategy, standardization of technologies, and establishment of patterns for rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks.
- Leads, implements and trains team members on measurement capability of core product availability across Azure and private Cloud using HTTP endpoint testing and synthetic user testing.
- Present usability, reliability, incident, and user experience of the core product services to senior and/or executive leadership on a weekly basis.
- Define and report SLOs/SLAs for 99.9% availability to executive leadership and business partners.
- Influences product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability
- Provide detailed analysis and troubleshooting for systems outages providing feedback to product/software engineering
Good to Have
- Knowledge of Micro servicesarchitecture
- Experience of Linux/Windows administration
- Azure architecture, developer, or devops certifications a plus
4 to 6 Years
2 - 4 Hires