Manager, Site Reliability Engineering Job in Adobe

Manager, Site Reliability Engineering

Apply Now
Job Summary

Our Company

Changing the world through digital experiences is what Adobes all about. We give everyonefrom emerging artists to global brandseverything they need to design and deliver exceptional digital experiences! Were passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.

Were on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!


The Challenge:

  • Are you comfortable with dev, comfortable with ops, and looking for a job that doesnt have DevOps in the title?

  • Do you have an intimate understanding of the operational challenges of running services at scale, and are you also committed to overcoming those challenges with software instead of manpower?

Adobe needs ahands onSite Reliability Engineer (SRE)Leader,who knows how to balance going fast and going big with operating safely. Our mission is to progress, protect, and provide for the software and systems behind all ofMarketo: AnAdobeCompany,with an ever-watchful eye onsystemavailability, latency, performance, and capacity. SRE is a mindset of engineering approaches which focuses on building highly reliable systems and eliminatingtoilthrough automation.

We hire people from both systems and software backgrounds. Strong candidates will have experience with both. The engineer role within SRE is at the heart of fulfilling SREs mission: build highly reliable, scalable & measurable customer experience for the continued growth ofMarketosinfrastructure.We are using both multi-cloud (Azure/AWS/GCP) and on-premiseenvironments. We are looking for someone who isambitious, has a passion for quality,andwants tohelp critical services succeedwithout compromising security.

Our SRE and Engineering teams are distributed, split between Denver, Colorado;San Mateo,California;Bangalore, India; andBucharest,Romania. Werely heavily on tools like Slack, JIRA and video conferencingto collaborate.Flexibility to join meetings with colleagues around the world is expected.The successful candidate must be able to prioritize tasks and work independently.

Marketo, an AdobeCompayis looking for an exceptionalhands-onengineering leader to be the Manager, Reliability Engineering forSite Reliability Engineering team. This is a critical, highly visible position which will report to the Director and will be responsible for ensuring that Adobes critical externally facing services meet the required reliability, availability and performance expectations while constantly focusing on driving improvements.

The Person would be responsible for providing Engineering solutions to prevent production issues in partnership with teams that own the productand alsoprovide engineering solutions.

What youll do

  • Build and manage strong global engineering and operations teams which run high volume, critical services

  • Manage web scale systems to demanding availability targets (99.99%+)

  • Drive strong engineering, QA, and technical operations

  • Manage critical interactions with internal and external stakeholders

  • Analyze and identify opportunities for continuous improvement and partner with engineering and operations team to constantly improve services

  • Engage with product and engineering todrive and improve the whole lifecycle of operational readiness - from inception and design, through deployment, operation and refinement proactively.

  • Write software layers, scripts, deployment frameworks, tracers, monitors, self-healing/auto remediation tools and automate the processes.

  • Build and maintain software modules for use and re-use in cloud andon-premisesystems automation.

  • Maintain business continuity by identifying and drivingopportunities to makesystems highly resilient and human-free.

  • Closely work withsoftware engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.

  • Maintain up-to-date documentation on deployments, processes,and standard operating procedures/run-bookswith a goal minimize runbooks by automation.

  • Even after self-healing and automation done by you if complex issues arise, get involvedwithtroubleshooting and root-cause analysis of issues across the stacks hardware, software, database, network and so on.

  • Participate in shared on-call schedule [follow-the-sun model] managed across SRE & Engineering.

  • Be an evangelist and promote lean-opscultureby applying self-service, self-healing and automation.

  • Work with product management team to define SLAs SLOs and implement SLIs for core capabilities.

  • Improve observability of software by implementing right monitoring, tracing and logging.

What you need to succeed

  • 8+ years of experience in product engineering organization is critical. At least 3 years of that experience should have been spent in a managerial role.

  • Ability to partner and influence product engineering teams is a must

  • Ability to lead large globally distributed teams in a matrixed environment

  • Ability to attract world class talent, coach and develop them

  • Strong analytical skills with a data driven approach to solving problems

  • Experience designing for and dealing with a large production environment.

  • ABachelorsorMastersdegreein computer science engineering or related.

  • Developing, running, and/or consuming cloud technologies such as AWS, Azure, Google Cloud Platform and related tooling: Terraform, configuration management, etc.

  • Recent large-scale experiencedeveloping, running and/or consuming on premise platforms and related tooling: VMware, Ansible,Chef orPuppet, configuration management, etc.

  • Programming (PythonandBashareour preferred scripting/shell languages) and automation skills.

  • Troubleshooting and system engineering exposure in Linux production environments.Experience with Linux, Internet Protocols, and Large-Scale Operations.

  • Experience with CI/CD tooling: Jenkins,Spinnaker, GitLab runners,Azure DevOps, etc.

  • Experience with designing, deploying and maintaining monitoring solutions such as Splunk,Prometheus, Check MK, etc.

  • Familiarity with AWS/Azure well architected frameworks and practical experienceinapplying resiliency and reliability patterns such as Circuit Breaker, Bulkhead etc...

  • Great communication, interpersonal,and teamwork skills.

  • Ability to work independently and own problem statements end-to-end.

Bonus skills

  • Experience with relational databases such asMySQL,Postgres,and document stores such as MongoDB.

  • Experience deploying applications in containers using Docker and Kubernetes.

  • Strong intuition about system design, robustness, and scalability.

  • Decent Experience with Windows.

At Adobe, you will be immersed in an exceptional work environment that is recognized throughout the world on. You will also be surrounded by colleagues who are committed to helping each other grow through our uniqueapproach where ongoing feedback flows freely.

If youre looking to make an impact, Adobe's the place for you.Discover what our employees are saying about their career experiences on theand explore the meaningfulwe offer.

Adobe is an equal opportunity employer. We welcome and encourage diversity in the workplace regardless of race, gender, religion, age, sexual orientation, gender identity, disability or veteran status.

Experience Required :

Fresher

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs