This role is for one of the Weekday's clients
Salary range: Rs 2000000 - Rs 3500000 (ie INR 20 - 35 LPA)
Min Experience: 3 years
Location: Remote (India)
JobType: full-time
We are looking for a Backend Ops Engineer to take ownership of infrastructure and operations, ensuring fast, reliable, and cost-efficient deployments. This role is critical in reducing operational overhead, improving system scalability, and enabling seamless product delivery.
You will focus on building robust infrastructure, optimizing cloud costs, and integrating AI-driven automation into DevOps workflows. This position is ideal for someone who thrives in high-growth environments and enjoys solving complex infrastructure challenges.
Requirements
Why This Role Matters
- Centralizes infrastructure ownership to improve delivery speed and reliability
- Enables proactive scaling for growing user demand and traffic spikes
- Optimizes cloud costs and operational efficiency
- Lays the foundation for compliance frameworks such as SOC 2 and GDPR
- Introduces AI-driven automation into infrastructure management
Key Responsibilities
Initial Focus (First Quarter)
- Implement AI-driven operations, including log analysis, automated infrastructure updates, and predictive scaling alerts
- Benchmark cloud and edge services to improve performance and scalability
- Build self-healing infrastructure pipelines that demonstrate advanced AI capabilities
Ongoing Responsibilities
- Design, automate, and manage infrastructure using Terraform and AWS services (ECS/Fargate, RDS, S3, IAM)
- Build and maintain CI/CD pipelines using GitHub Actions for efficient deployments
- Implement and manage observability tools such as Prometheus, Grafana, OpenTelemetry, and Sentry
- Handle containerization using Docker and troubleshoot performance issues under load
- Collaborate with backend teams to ensure low-latency, scalable, and cost-effective services
Long-Term Growth
- Progress into a Staff Platform Engineer or Lead SRE role with end-to-end platform ownership
- Contribute to building scalable deployment frameworks for enterprise use cases
- Help define best practices for AI-driven DevOps and infrastructure management
Requirements
Must-Have Skills & Experience
- 2–3+ years of experience in DevOps or Site Reliability Engineering (SRE) roles, preferably in high-growth environments
- Strong expertise with AWS services (ECS/Fargate, RDS, S3, CloudWatch, IAM)
- Hands-on experience with Terraform and Infrastructure as Code (IaC)
- Proficiency in CI/CD pipelines (GitHub Actions) and Docker
- Experience with observability tools including Prometheus, Grafana, OpenTelemetry, and Sentry
- Proven ability to troubleshoot and optimize infrastructure under high load conditions
AI-Driven Mindset
- Interest or experience in integrating AI into DevOps workflows
- Exposure to LLM APIs (e.g., OpenAI, Anthropic, Hugging Face) is a plus
Nice-to-Have
- Familiarity with SOC 2 and GDPR compliance requirements
- Experience working with multi-cloud environments (GCP, Azure)
- Scripting skills in Python for automation
- Knowledge of infrastructure security best practices
- Experience with DigitalOcean services or cloud migration strategies
Soft Skills
- Strong ownership mindset with the ability to independently drive outcomes
- Clear communication skills to explain technical trade-offs to cross-functional teams
- Proactive approach with a strong focus on automation and continuous improvement
Core Skills
AWS | Terraform | Docker | CI/CD | Observability | DevOps | SRE