Key Facts

Remote From:

United States

Category: DevOps Engineer

Full time

Mid-level (2-5 years)

English

Hard Skills

Reliability Engineering AWS Cloud Services Kubernetes Observability Datadog System Monitoring Prometheus (Software) Containerization Root Cause Analysis AWS CloudFormation +22 more

Other Skills

•
Calmness Under Pressure
•
Non-Verbal Communication
•
Analytical Skills
•
Detail Oriented
•
Prioritization
•
Problem Solving

Roles & Responsibilities

Bachelor's degree in Computer Science, Software Engineering, or a related field.
Minimum of 2 years of professional experience in DevOps, SRE, or a related role, with a focus on observability and reliability.
Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, ELK stack, DataDog, or New Relic.
Proficiency with containerization and orchestration technologies like Docker and Kubernetes; experience with cloud platforms (e.g., AWS).

Requirements:

Design, implement, and maintain monitoring and alerting systems for production and development environments using Prometheus, Grafana, DataDog, Elastic Stack, or equivalent to ensure high availability and reliability.
Lead incident response efforts, including root cause analysis, and implement measures to prevent recurrence; develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure system reliability and availability.
Create and maintain automated CI/CD pipelines to reduce release cycle times and improve developer workflows; automate infrastructure provisioning and management with Terraform, Ansible, Helm, or CloudFormation.
Deploy, maintain, and scale cloud infrastructure on AWS, Azure, or Google Cloud; manage container orchestration with Kubernetes and Docker to ensure efficient resource usage, security, and scalability.

Job description

Overview:

LMI is a new breed of digital solutions provider dedicated to accelerating government impact with innovation and speed. Investing in technology and prototypes ahead of need, LMI brings commercial-grade platforms and mission-ready AI to federal agencies at commercial speed.

Leveraging our mission-ready technology and solutions, proven expertise in federal deployment, and strategic relationships, we enhance outcomes for the government, efficiently and effectively. With a focus on agility and collaboration, LMI serves the defense, space, healthcare, and energy sectors—helping agencies navigate complexity and outpace change. Headquartered in Tysons, Virginia, LMI is committed to delivering impactful results that strengthen missions and drive lasting value.

Responsibilities:

We are seeking a highly motivated and skilled DevOps Engineer with a strong focus on observability and reliability to join our health project team. The successful applicant will become part of an analytical team that supports public health systems management projects with a primary focus on Medicare, Medicare Advantage, and Risk Adjustment. This role is pivotal in ensuring the stability, scalability, and performance of our healthcare technology infrastructure and applications.

You will play a key role in developing and implementing monitoring strategies, improving system reliability, and automating workflows to streamline our development and operations processes. You will work closely with cross-functional teams to deliver a secure, compliant, and always-available platform for users.

Responsibilities include:

Observability & Monitoring:
- Design, implement, and maintain monitoring and alerting systems for production and development environments to ensure high availability and reliability.
- Leverage tools like Prometheus, Grafana, DataDog, Elastic Stack, or equivalent to track system performance and application health.
- Proactively detect and troubleshoot performance bottlenecks, infrastructure issues, and failures.
Reliability Engineering:
- Optimize the performance and reliability of highly available systems supporting healthcare payment applications.
- Lead incident response efforts, including root cause analysis, and implement measures to prevent recurrence.
- Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure system reliability and availability.
- Ensure the consistent delivery of high-quality health-related data in compliance with industry standards such as HIPAA.
Automation & CI/CD Pipelines:
- Create and maintain automated deployment pipelines (CI/CD) to reduce release cycle times and improve workflows for developers.
- Automate infrastructure provisioning and management through tools such as Terraform, Helm, Ansible, or CloudFormation.
- Improve the operational efficiency of development and deployment processes.
Cloud Infrastructure Management:
- Deploy, maintain, and scale cloud infrastructure on platforms such as AWS, Azure, or Google Cloud, ensuring compliance with healthcare-sector security and privacy requirements.
- Implement and manage container orchestration platforms such as Kubernetes and Docker to ensure efficient resource usage and scalability.
- Optimize cloud resources for cost efficiency and streamline infrastructure provisioning.
Collaboration & Documentation:
- Partner with development and product teams to ensure seamless integration of observability tools and reliability practices through every stage of the software delivery lifecycle.
- Document architecture, processes, metrics, and troubleshooting guides to support scalability and knowledge sharing across the organization.
- Actively contribute to improving engineering workflows, reliability processes, and operational excellence.

Qualifications:

MINIMUM QUALIFICATIONS

Bachelor's degree in Computer Science, Software Engineering, or a related field.
Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role, preferably focused on observability and reliability.
Hands-on experience with monitoring tools such as Prometheus, Grafana, ELK stack, DataDog, New Relic, or similar platforms.
Experience with containerization and orchestration technologies like Docker and Kubernetes.
Proficiency with cloud platforms such as AWS.
Solid understanding of infrastructure as code (IaC) with tools like Terraform, Ansible, or CloudFormation.
Knowledge of scripting languages such as Python, Bash, or PowerShell for automation tasks.
Familiarity with CI/CD tools such as Jenkins, GitLab CI/CD, CircleCI, or similar frameworks.
Strong understanding of network protocols, monitoring, and troubleshooting best practices.
- Strong problem-solving and analytical abilities, with extreme attention to detail and a commitment to reliability and excellence.
- Excellent written and verbal communication skills, able to interact effectively with cross-functional teams.
- Ability to work under pressure and prioritize tasks in fast-paced environments.

PREFERRED QUALIFICATIONS

Experience with healthcare-focused projects or systems.
Familiarity with healthcare compliance requirements such as HIPAA.
Certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Linux Foundation Certified Kubernetes Administrator.

Experience in federal consulting.
Demonstrated experience with healthcare data projects (e.g. claims processing or payment systems) or financial/banking systems.

Target Salary Range: $70,000 - $99,000

Disclaimer:

The salary range displayed represents the typical salary range for this position and is not a guarantee of compensation. Individual salaries are determined by various factors including, but not limited to location, internal equity, business considerations, client contract requirements, and candidate qualifications, such as education, experience, skills, and security clearances.

Ready to apply?

APPLY

Share ·