Match score not available

Site Reliability Engineer

unlimited holidays
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 
Texas (USA), United States

Offer summary

Qualifications:

5 years experience in cloud infrastructure, BA/B.Sc. in Computer Science preferred, Experience with Python or Go programming language, Familiarity with Kubernetes and GitOps tools, Experience with monitoring tools and databases.

Key responsabilities:

  • Design, deploy, and maintain cloud infrastructure
  • Optimize deployment processes and system performance
  • Participate in incident management for uptime
  • Engage in the entire service lifecycle
  • Utilize automation to improve reliability
Imubit logo
Imubit Scaleup https://www.imubit.com/
51 - 200 Employees
See more Imubit offers

Job description

TL;DR:

Imubit is looking for a Site Reliability Engineer to help disrupt the refining and chemical industries with breakthrough machine learning technologies.

 

About us:

Imubit directly controls and optimizes refineries and chemical plants with AI to add millions of dollars to the plant bottom line while managing safe operating limits, energy efficiency, and sustainability objectives. Imubit’s Closed Loop Neural Network platform allows customers to leverage an advanced form of AI called Reinforcement Learning (RL). Through our patented approach to apply RL for industrial processes, industry leaders have been able to fundamentally change the way they optimize their plants and improve profitability in real-time. Imubit’s solution is currently optimizing the manufacturing facilities of Fortune-500 companies. Imubit has combined the industry expertise from companies like Exxon and Shell with award-winning data scientists endorsed by Google. Imubit is backed by tier-1 venture capital firms such as Insight Partners.

 

We are looking for:

You, a top-notch Site Reliability Engineer, who will design and support Imubit’s cloud infrastructure. As part of this, you will work to optimize deployment processes and keep systems running. You will work with a variety of cloud technologies, automation, and infrastructure-as-code. Additionally, our SREs keep an ever-watchful eye on our systems capacity and performance. Much of our time is spent optimizing existing systems, building infrastructure and reducing repetitive work through automation.

You will also play a critical role in incident management, swiftly identifying and resolving issues to minimize downtime and ensure seamless operations. Collaboration is key in this role, as you will work closely with software developers, DevOps engineers, and other stakeholders to implement robust solutions and drive continuous improvement. As a proactive member of our team, you will stay updated with the latest industry trends and best practices, applying this knowledge to enhance our infrastructure's resilience and scalability. Your contributions will directly impact the reliability and efficiency of our services, making you an integral part of our success.

 

In this position, you will:

  • Design, deploy and maintain Imubit’s cloud infrastructure to provide high uptime, scalability and security.
  • Leverage public cloud services and tools to improve efficiency and reliability of our services and workflows.
  • Architect and manage cross-cloud network infrastructure (e.g. subnets, routing tables, IPSec VPNs, Transit Gateways, firewall rules).
  • Engage in and improve the whole lifecycle of services, from inception and design, through deployment, operation and refinement.
  • Participate in infrastructure on-call rotation and respond in a timely manner.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.


Minimum Qualifications: 

  • 5 years experience maintaining production level cloud infrastructure, including public cloud services (e.g. AWS, GCP).
  • Preferred BA/B.Sc. in Computer Science or equivalent
  • Experience with a programming language such as Python or Go.
  • Experience deploying and supporting services in Kubernetes, including GitOps management tools such as ArgoCD.
  • Familiarity with software development principles/concepts (e.g. Version control (Git), software development lifecycle).
  • Experience implementing and utilizing monitoring tools (e.g New Relic, Splunk, Grafana, Prometheus).
  • Experience managing production databases (e.g. PostgreSQL), including managed services (e.g. AWS RDS).
  • Experience with Infrastructure-as-code concepts and tools (e.g. Terraform, Ansible)
  • Experience with secrets management tools (e.g. HashiCorp Vault, AWS Secrets Manager)
  • Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Ability to debug and optimize code and automate routine tasks.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.


Imubit provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics. In addition to federal law requirements, Imubit complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.

 

Imubit does not accept or retain unsolicited CVs or phone calls and/or respond to them or to any third party representing job seekers.

 

No visa sponsorship is available for this position.

 

careers@imubit.com

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Verbal Communication Skills
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs