Site Reliability Engineer AWS

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

5+ years of experience in SRE/DevOps roles in AWS., Hands-on expertise with AWS services like EC2, S3, Lambda, EKS, VPC, IAM., Strong knowledge of cost optimization techniques in AWS., Proficiency in Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK..

Key responsibilities:

  • Design, build, and optimize cloud platform and infrastructure.
  • Implement cloud roadmap and strategy for scalability, reliability, and security.
  • Develop Infrastructure as Code and automate deployment pipelines.
  • Ensure high availability, security, and cost-efficiency of the platform.

Acuity Systems, Inc. logo
Acuity Systems, Inc. http://www.salesmadeeasy.com
2 - 10 Employees
See all jobs

Job description

Join us. Let’s make a direct impact in healthcare.

Being an Iodine employee means becoming part of something bigger using clinical AI technology to drive smarter healthcare processes and positively impact patient care.

Who we are:

Recognized as one of Austin’s best places to work, we are a collaborative and dedicated team with innovation built into our DNA. Iodine is an enterprise AI company that is championing a radical rethink of how to create value for healthcare professionals, leaders, and their organizations by automating complex clinical tasks, generating insights and empowering intelligent care. Powered by one of the largest sets of clinical data and use cases available, our groundbreaking clinical machinelearning engine, Cognitive ML, constantly ingests the patient record to generate realtime, highly focused, predictive insights that clinicians and hospital administrators can leverage to dramatically augment the management of care delivery.

We are seeking a highly skilled Site Reliability Engineer (SRE) with AWS Cloud expertise to design, build, and optimize our cloud platform and infrastructure. This role demands deep handson experience with AWS cloud services across compute, storage, databases, networking, and security, combined with strong cost optimization strategies. You will implement the cloud roadmap and strategy, design scalable solutions, and ensure the reliability, security, and costefficiency of the platform and infrastructure. You will be responsible for the scalability of the platform and infrastructure, ensuring it can support business growth while maintaining high availability and performance. for driving reliability, security, cost optimization, and operational excellence across our platform. Additionally, you will participate in key architectural discussions with product engineering and security teams to ensure new and existing services follow best practices and meet operational excellence standards.

If you are an experienced SRECloud Engineer with AWS expertise and a strong SRE mindset, passionate about high availability, security, automation, operational and cost efficiency, we would love to hear from you. This role will:

Cloud Strategy & Roadmap
  • Implement the cloud roadmap and strategy to drive scalability, reliability, security, and cost efficiency.

  • Drive cloud adoption initiatives, ensuring alignment with business objectives.

  • Implement and support initiatives on cloud governance, architectural best practices, and modernization strategies.

    • Automation & Reliability Engineering
      • Develop Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK for fully automated provisioning and deployment.

      • Own and improve infrastructure CICD pipelines using Gitlab, Ansible (AWX), Argo CD, Helm

      • Implement selfhealing, faulttolerant architectures that can automatically recover from failures.

      • Optimize infrastructure monitoring and observability using Prometheus, Grafana, Loki, Tempo, Mimir, AWS CloudWatch, AWS Cloudtrail and New Relic

      • Participate in architecture discussions with product engineering teams for onboarding new services, ensuring they are scalable, costoptimized, and aligned with best engineering practices.

      • Collaborate with software developers to optimize application performance and cloudnative designs.

        • Operational Duties & Business Support
          • Perform regular system and infrastructure maintenance including OSlevel patching, AMI refreshes, and kernel upgrades.

          • Lead and coordinate planned upgrade cycles for core services like RDS, EKS, and Kubernetes clusters to ensure security and feature compatibility.

          • Troubleshoot and resolve infrastructure and applicationlevel issues, collaborating directly with internal teams and business stakeholders.

          • Participate in customer support escalations and provide technical guidance for resolution.

            • Incident Response & Operational Excellence
              • Lead and refine incident management processes for the SRE team, ensuring minimal downtime and fast recovery.

              • Implement SLOs, SLIs, and error budgets to drive system reliability.

              • Conduct postmortems and drive root cause analysis to prevent recurring issues.

                • Security, Compliance, and Best Practices
                  • Ensure cloud security best practices are embedded into all solutions, including IAM policies, VPC security, encryption, and compliance with industry standards (such as SOC 2, HIPAA).

                  • Implement least privilege access, network segmentation, and automated security controls across AWS services.

                  • Collaborate with InfoSec teams to enforce threat detection, logging, and security monitoring using AWS GuardDuty, Security Hub, and CloudTrail.

                    • Solution Architecture & Infrastructure Design
                      • Design and build highly available, scalable, and faulttolerant AWS architectures using AWS services such as EC2, S3, RDS, DocumentDB, Lambda, EKS, Secrets Manager, SSM, API Gateway, and CloudFront and other related technologies such as Hashicorp Terraform, Vault and Consul and Ansible (AWX)

                      • Implement and support resilient storage, compute, and database solutions optimized for performance and cost.

                      • Drive the execution of multiregion disaster recovery (DR) and backup strategies.

                        • AWS Cost Optimization & FinOps
                          • Continuously monitor and optimize AWS infrastructure costs using AWS Cost Explorer, Trusted Advisor, and Savings PlansReserved Instances.

                          • Drive FinOps culture, ensuring teams design and deploy costefficient cloud solutions.

                          • Implement autoscaling, rightsizing strategies, and storage lifecycle policies to reduce costs.

                            • Minimum Requirements
                              • 5+ years of experience in SREDevOps roles in AWS.

                              • Handson expertise with AWS services, including EC2, S3, Lambda, EKS, VPC, IAM, Secrets Manager, SSM and technologies such as Haschicorp Vault and Consul

                              • Strong knowledge of cost optimization techniques in AWS, including autoscaling, rightsizing, storage lifecycle policies, and Reserved InstancesSavings Plans.

                              • Strong handson experience with Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK.

                              • Proficiency in Linux Administration, Python, or Bash scripting for automation.

                              • Experience with Kubernetes (EKS), Docker, and container orchestration.

                              • Strong security and compliance knowledge, including IAM, security groups, encryption, AWS WAF, and logging with CloudTrail.

                              • Handson experience with monitoring and observability tools like Prometheus, Grafana, AWS CloudWatch, Loki, and New Relic.

                              • Experience in approving merge and pull requests, ensuring highquality infrastructure code.

                              • Strong team collaboration, documentation and communication skills.

                              • Travel to and from company headquarters is required for mandatory onboarding and company meetings.

                                • Preferred Requirements
                                  • AWS Certifications (e.g., AWS Certified Solutions Architect Professional, AWS Certified DevOps Engineer).

                                  • Experience with multiaccount AWS organizations and AWS Control Tower.

                                  • Familiarity with service meshes (Istio, Linkerd) and API gateways.

                                  • Experience with Fortinet (FortiGate) firewalls and AWS networking (VPC, Transit Gateway, Direct Connect, etc.).

                                  • Background in database administration (PostgreSQL, MySQL, DocumentDB, or NoSQL databases).

                                  • Experience implementing resilience testing and chaos engineering.

                                    • What we offer:

                                      • Comprehensive Healthcare: Fully covered medical, vision, and dental benefits for employees, plus generous dependent coverage.

                                      • Telehealth Services: Convenient access to telehealth services tailored for remote work.

                                      • Savings Accounts: Taxadvantaged savings accounts for healthcare and dependent care expenses.

                                      • Ancillary Benefits: Life, AD&D, and disability insurance paid by Iodine for peace of mind.

                                      • Retirement Plan: Competitive 401(k) retirement plan with a considerable company match.

                                      • Extra Life Insurance: Optional additional life insurance coverage for you and your dependents.

                                      • Accident Insurance: Financial protection against unexpected accidents and critical health issues.

                                      • Critical Illness Insurance: Provides financial support for medical costs and living expenses during serious illness.

                                      • Hospital Indemnity Insurance: Additional support for hospitalrelated expenses through indemnity insurance.

                                      • Pet Insurance: Affordable options for discounted pet insurance.

                                      • Legal and Identity Protection: Legal and ID theft protection to safeguard personal information.

                                      • Employee Assistance: Confidential employee assistance program for personal and professional challenges.

                                      • Education Allowance: Annual funding for educational pursuits and continuing education to support professional development and skill enhancement.

                                      • Reimbursements: Annual reimbursement for eligible wellness expenses, monthly reimbursement for cell phone and WiFi costs, and a onetime equipment allowance for creating a comfortable home office.

                                        • Why should you join Iodine?
                                          This is a unique opportunity to join a closeknit, rapidly growing team and help us improve a key piece of the organization. You will have the opportunity to drive smarter healthcare processes through technology, so hospitals can stay focused on patient care. You will join a passionate and ambitious team, with a proven record of success building multiple companies. Learn more about our company culture on Built In Austin and on our website at www.iodinesoftware.com.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Communication

Site Reliability Engineer (SRE) Related jobs