Site Reliability Engineer (India)

extra holidays - extra parental leave
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Experience with cloud platforms such as GCP, AWS, and Azure., Proficiency in Infrastructure-as-Code tools like Terraform and Helm., Strong problem-solving and troubleshooting skills., Excellent communication and collaboration abilities..

Key responsibilities:

  • Design and maintain scalable, reliable systems in a multi-cloud environment.
  • Develop automation tools and scripts for deployment and incident response.
  • Configure monitoring and alerting systems to proactively detect issues.
  • Participate in incident management and contribute to system security and documentation.

Blockdaemon logo
Blockdaemon Scaleup https://blockdaemon.com/
201 - 500 Employees
See all jobs

Job description

Position Overview

As a Site Reliability Engineer (SRE), you will play a critical role supporting our Blockdaemon team by ensuring the reliability, scalability, and performance of our systems and services. You will collaborate closely with crossfunctional teams to design, implement, and maintain robust and resilient infrastructure solutions in a MultiCloud environment.

The ideal candidate is passionate about automation, possesses strong analytical skills, and thrives in a fastpaced, dynamic environment.

Blockdaemon is a Blockchain Infrastructure Company operating in a multicloud configuration with a global footprint. The expectation for this role is a candidate capable of supporting systems & infrastructure stack across the major clouds, Google Cloud Platform (GCP) and Amazon Web Services (AWS), Azure.

Your Impact

  • System Architecture and Design: Collaborate with software engineering teams to design scalable, highly available, and resilient systems. Drive architectural improvements to enhance system reliability and performance.

  • Implement Infrastructure as Code to manage services and deployments in a multicloud, multiproject configuration.

  • Automation and Tooling: Develop automation tools and scripts to streamline deployment, monitoring, and incident response processes. Implement and maintain infrastructure as code frameworks.

  • Monitoring and Alerting: Configure and maintain monitoring systems to detect and mitigate potential issues proactively. Define alerting thresholds and response procedures to ensure timely incident resolution.

  • Incident Management: Respond to and resolve critical incidents, perform root cause analysis, and implement preventive measures to minimize the likelihood of recurrence. Participate in an oncall rotation to provide 247 support as needed.

  • Capacity Planning and Performance Optimization: Analyze system performance metrics, identify bottlenecks, and propose optimizations to improve resource utilization and efficiency.

  • Security and Compliance: Work closely with security teams to implement best practices for data protection, access control, and compliance with regulatory requirements. Conduct periodic security audits and vulnerability assessments.

  • Documentation and Knowledge Sharing: Document system configurations, procedures, and troubleshooting steps. Share knowledge and best practices with team members to foster a culture of continuous learning and improvement.

    • Role Requirements

      Must Have:
      • Proven experience in an independent contributor role working with cloud platforms: GCP, AWS, Azure, InfrastructureasCode tooling: Terraform, Helm, and CICD orchestration platforms: GitlabCI, ArgoCD, Github Actions or similar GitOps workflows.

      • Excellent problemsolving skills and the ability to independently troubleshoot complex issues.

      • Strong communication and collaboration skills, with the ability to work effectively in crossfunctional teams.

      • Strong Architectural & Security Mindset.

        • Should Have:
          • Strong understanding of LinuxUnix systems administration and networking concepts.

          • Handson experience with configuring and running monitoring tools like Prometheus, Grafana, etc.

          • 5+ years experience of maintaining infrastructureascode on Google Cloud Platform, Amazon Web Services and Azure.

          • Experience working in SOC 2 Type 1 and Type 2 certified companies.

            • NicetoHave:
              • Proficiency in scripting and programming languages such as BASH, Golang, Python and TypeScript.

              • 2+ years handson experience operating highly available Kubernetes clusters.

              • Experience being involved in incident management and resolution.

              • Experience with AI development tools and related security considerations.

              • Passion for the Blockchain Industry & Decentralised Systems.

              • Experience with Blockchain Infrastructure, either in a personal or professional capacity.

                • About Us:


                  We Power the Blockchain economy.


                  Blockdaemon powers the blockchain economy with its suite of industryleading
                  infrastructure solutions. We are a globally established, ISO27001 certified partner with extensive protocol coverage, offering technical depth, industryleading SLAs, 70+ global points of presence through 10+ cloud and bare metal providers, and 247 support for an unmatched institutionalgrade experience. We provide integrated business solutions to exchanges, custodians, crypto platforms, financial institutions, and developers using our endtoend suite of blockchain tools, including dedicated nodes, APIs, staking, liquid staking, MPC tech, and more. Blockdaemon provides its customers with the confidence to quickly and easily scale without compromising security or compliance.


                  We are a globally distributed team.


                  Blockdaemon is an Equal Opportunity Employer.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Collaboration
  • Communication

Site Reliability Engineer (SRE) Related jobs