Sr. Devops Engineer Job in Springboard

Sr. Devops Engineer

Apply Now
Job Summary The Company
Springboard is redefining professional education for the 21st century through immersive, mentor-supported courses in cutting-edge fields like data science and design. Our self-paced, online offerings give anyone, anywhere access to world-class learning resources, with an emphasis on project-based learning, industry-relevant curriculum, and tangible outcomes. Through this hybrid approach, we ve helped thousands of learners revamp their careers and, by extension, their lives.
The Opportunity
As a Senior Site Reliability Engineer (SRE) at Springboard, you will be a key member for our cloud infrastructure and tech-operations initiatives. You will utilise your diverse background in operations, cloud, systems engineering, and monitoring to ensure uptime, reliability, efficiency and health of our web-services on staging and production. You ll learn quickly, be hands-on, own key processes, and make continuous improvements to the quality of our services and operations, as we scale.

Responsibilities

    • Being the primary person responsible for reliability, health, and performance of our cloud infrastructure and services. Ensuring uptime and service health.
    • Implementing cloud infrastructure strategies, network configurations & kubernetes cluster configurations for security, scale, performance, reliability, and efficiency.
    • Gaining a deep understanding of Springboard s application ecosystem and services. Setting up monitoring, telemetry (logs, metrics, events) on production systems and deployment pipelines to improve product rollouts and efficient execution.
    • Thinking, innovating and engineering solutions to anticipate, detect and solve complex problems. Conducting tests and validating observations. Developing scripts and custom tools where conventional tools fall short. Evaluating solutions, and proof-of-concepts to improve the system.
    • Own Infrastructure Operations: Managing and addressing requests from engineering teams. Defining, implementing and streamline processes for service & audit.
    • Learning, improving continuously, setting a high bar for quality, while advocating and adopting industry best engineering practices. Identify bottlenecks and make recommendations to the engineering team to improve security and reliability.
    • Using your excellent communication skills, empathy and training skills to groom an engineering team towards building a strong SRE function at Springboard.

You:

    • Must have 5+ years of experience in Site Reliability Engineering, having cloud infrastructure management and administrative responsibilities in a production environment.
    • Must be an expert on kubernetes, with operational knowledge and experience in managing production clusters, with self-healing, auto-scaling, load balancing, probes, volumes.
    • Must have experience of VPCs, IAM, Load balancers, DNS, API gateways, firewalls, relational DBs, blob-stores and managed services from a popular cloud provider like AWS, Azure or GCP (preferred).
    • Must have excellent linux system-administration knowledge on tools, system health & performance, services & daemons, containers (docker)
    • Must know shell scripting (bash) with a strong familiarity with linux command line tools.
    • Are experienced with Infrastructure Monitoring using tools like DataDog, Graphana, Splunk etc. You are comfortable setting them up from scratch, and debugging issues with them.
    • Have operational experience on web-based applications, ReST APIs, GraphQL, authentication, certificate management, CDNs etc.
    • Have operational experience with Git, semantic versioning, CI/CD, package management tools on linux distros and language tools, like npm, pip, Helm.
    • You like to automate repetitive tasks. You follow KISS and DRY principles. You either know, or are interested in learning a language like JavaScript or Python.
    • Operate with minimal supervision. You are meticulous in activities. You prioritize tasks. You reason objectively. You are decisive. You are an excellent communicator.
    • You are a self-learner, who seeks to improve constantly. You share your knowledge generously, and you mentor engineers to meet and exceed your standards. You strive to learn and implement best practices, and define policies and guidelines for the team.
    • Are passionate about SRE. You are curious about tech, and honing your skills. You aim to learn, grow and excel as a site reliability engineer.
    • Are a preferred candidate if you have a Kubernetes Administrator (CKA) certification or a cloud certification from AWS or Google (preferable).
Experience Required :

Fresher

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs