Logo for Weekday (YC W21)

Staff Engineer - DevOps

Roles & Responsibilities

  • 9-15 years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
  • Deep expertise in Kubernetes, container orchestration, and production-grade Docker deployments.
  • Strong understanding of Infrastructure-as-Code (Terraform, CloudFormation, etc.).
  • Expertise in CI/CD automation and release management.

Requirements:

  • Lead end-to-end DevOps strategy, including CI/CD pipelines, automation, infrastructure-as-code, and release engineering, while establishing reliability standards and operational governance.
  • Architect and manage large-scale Kubernetes environments for production workloads, optimize workloads across clusters for performance, reliability, and cost efficiency, and drive multi-cluster/multi-region deployments.
  • Own infrastructure cost visibility and savings initiatives, including rightsizing, reserved capacity planning, auto-scaling optimization, and workload scheduling; partner with finance for budgeting, forecasting, and reporting; create dashboards to track infrastructure ROI and spend trends.
  • Design and implement comprehensive observability using Grafana and related tools; build real-time dashboards, establish alerting, and drive incident response improvements; automate provisioning, deployments, scaling, and disaster recovery processes.

Job description

This role is for one of the Weekday's clients

Min Experience: 9 years

Location: Remote (India)

JobType: full-time

As a Staff Engineer, you will architect and evolve our DevOps ecosystem, champion cloud cost governance, and implement best-in-class container orchestration practices. You will work cross-functionally with engineering, security, and finance teams to ensure operational excellence while proactively managing infrastructure spend.

Requirements

Key Responsibilities

DevOps Leadership & Architecture

  • Lead end-to-end DevOps strategy, including CI/CD pipelines, automation, infrastructure-as-code, and release engineering.
  • Design scalable, resilient cloud-native architectures aligned with business growth.
  • Establish DevOps best practices, reliability standards, and operational governance.

Kubernetes & Containerization

  • Architect and manage large-scale Kubernetes environments for production workloads.
  • Optimize workloads across clusters for performance, reliability, and cost efficiency.
  • Build and maintain containerized applications using Docker and Kubernetes, ensuring portability and scalability.
  • Drive multi-cluster, multi-region deployments where necessary.

Cost Savings & Cost Planning

  • Own infrastructure cost visibility and optimization initiatives.
  • Implement cloud cost-saving strategies including rightsizing, reserved capacity planning, auto-scaling optimization, and workload scheduling.
  • Partner with finance teams for budgeting, forecasting, and cost planning.
  • Create dashboards and reporting mechanisms to track infrastructure ROI and spend trends.
  • Continuously identify inefficiencies and implement measurable cost-reduction initiatives without compromising performance.

Monitoring & Observability

  • Design and implement comprehensive monitoring systems using Grafana and related observability tools.
  • Build real-time dashboards for system health, performance metrics, and cost insights.
  • Establish alerting frameworks to minimize downtime and improve incident response.
  • Drive improvements in system reliability through data-driven monitoring and post-incident analysis.

Automation & Reliability

  • Automate provisioning, deployments, scaling, and recovery processes.
  • Improve system resilience, availability, and disaster recovery strategies.
  • Lead root cause analysis for major incidents and implement preventive measures.

Required Qualifications

  • 9–15 years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
  • Deep expertise in Kubernetes, container orchestration, and production-grade Docker and Kubernetes implementations.
  • Strong hands-on experience with Grafana, monitoring systems, and observability frameworks.
  • Proven track record in cost savings initiatives and infrastructure cost planning in cloud environments.
  • Experience designing highly available, scalable systems in AWS, Azure, or GCP.
  • Strong understanding of Infrastructure-as-Code (Terraform, CloudFormation, etc.).
  • Expertise in CI/CD automation and release management.
  • Solid knowledge of networking, security best practices, and cloud architecture patterns.

Preferred Attributes

  • Experience managing large-scale production environments with strict SLAs.
  • Strong analytical skills with the ability to translate technical metrics into financial impact.
  • Leadership mindset with experience mentoring engineers and influencing cross-functional teams.
  • Excellent communication and stakeholder management skills.

Engineering Manager Related jobs

Other jobs at Weekday (YC W21)

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.