Infrastructure Site Reliability Engineer (SRE)

fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor's, Master's, or Doctorate in Computer Science or related field., Deep knowledge and hands-on experience with Kafka cluster setup and management., Expertise in managing OpenSearch clusters, including indexing and query optimization., Proficiency with AWS services like RDS and EKS, and experience in multi-cloud environments..

Key responsibilities:

  • Manage and optimize Kafka clusters for data integration and performance.
  • Configure and monitor OpenSearch clusters to ensure high availability and query efficiency.
  • Manage AWS RDS and EKS services, including deployment, scaling, and security.
  • Design and oversee infrastructure across multiple cloud providers, ensuring security and cost efficiency.

Velotio Technologies logo
Velotio Technologies Scaleup http://www.velotio.com
201 - 500 Employees
See all jobs

Job description

Velotio Technologies is a product engineering company working with innovative startups and enterprises. We are a certified and recognized as one of the best companies to work for in India. We have provided fullstack product development for 110+ startups across the globe building products in the cloudnative, data engineering, B2B SaaS, IoT & Machine Learning space. Our team of 400+ elite software engineers solves hard technical problems while transforming customer ideas into successful products.

About the Role:
We are seeking an experienced Infrastructure Site Reliability Engineer (SRE) to join our team. This role is critical for ensuring the reliability, scalability, and performance of our infrastructure, particularly in managing and optimizing highthroughput data systems. You will work closely with engineering teams to design, implement, and maintain robust infrastructure solutions that meet our growing needs.

As an Infrastructure SRE, you will be at the forefront of managing and optimizing our Kafka and OpenSearch clusters, AWS services, and multicloud environments. Your expertise will be key in ensuring the smooth operation of our infrastructure, enabling us to deliver highperformance and reliable services. This is an exciting opportunity to contribute to a dynamic team that is shaping the future of data observability and orchestration pipelines.

Requirements

Responsibilities

  • Kafka Management: Set up, manage, and scale Kafka clusters, including implementing and optimizing Kafka Streams and Connect for seamless data integration. Finetune Kafka brokers and optimize producerconsumer configurations to ensure peak performance.
  • OpenSearch Expertise: Configure and manage OpenSearch clusters, optimizing indexing strategies and query performance. Ensure high availability and fault tolerance through effective data replication and sharding. Set up monitoring and alerting systems to track cluster health.
  • AWS Services Proficiency: Manage AWS RDS instances, including provisioning, configuration, and scaling. Optimize database performance and ensure robust backup and recovery strategies. Deploy, manage, and scale Kubernetes clusters on AWS EKS, configuring networking and security policies, and integrating EKS with CICD pipelines for automated deployment.
  • MultiCloud Environment Management: Design and manage infrastructure across multiple cloud providers, ensuring seamless cloud networking and security. Implement disaster recovery strategies and optimize costs in a multicloud setup.
  • Linux Administration: Optimize Linux server performance, manage system resources, and automate processes using shell scripting. Apply best practices for security hardening and troubleshoot Linuxrelated issues effectively.
  • CICD Automation: Design and manage CICD pipelines using tools like Jenkins, GitLab CI, or CircleCI, and ArgoCD. Automate deployment processes, integrate with version control systems, and implement advanced deployment strategies like bluegreen deployments, canary releases, and rolling updates. Ensure security and compliance within CICD processes.
    • Qualification

      • Bachelor’s, Master’s, or Doctorate in Computer Science or a related field.
      • Deep knowledge of Kafka, with handson experience in cluster setup, management, and performance tuning.
      • Expertise in OpenSearch cluster management, indexing, query optimization, and monitoring.
      • Proficiency with AWS services, particularly RDS and EKS, including experience in database management, performance tuning, and Kubernetes deployment.
      • Experience in managing multicloud environments, with a strong understanding of cloud networking, security, and cost optimization strategies.
      • Strong background in Linux administration, including system performance tuning, shell scripting, and security hardening.
      • Proficiency with CICD automation tools and best practices, with a focus on secure and compliant pipeline management.
      • Strong analytical and problemsolving skills, essential for troubleshooting complex technical challenges.
        • Benefits

          Our Culture:
          • We have an autonomous and empowered work culture encouraging individuals to take ownership and grow quickly.
          • Flat hierarchy with fast decision making and a startuporiented “get things done” culture.
          • A strong, fun, and positive environment with regular celebrations of our success. We pride ourselves in creating an inclusive, diverse, and authentic environment.
            • At Velotio, we embrace diversity. Inclusion is a priority for us, and we are eager to foster an environment where everyone feels valued. We welcome applications regardless of ethnicity or cultural background, age, gender, nationality, religion, disability or sexual orientation.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Analytical Skills

Site Reliability Engineer (SRE) Related jobs