This is a remote position.
As a Senior Site Reliability Engineer, you will play a key role in ensuring the stability and scalability of IT systems. Your expertise in cloud platforms, infrastructure operations, Kubernetes orchestration, application development, database management, and Oracle E-Business Suite (EBS) will be essential in maintaining critical business platforms. This position involves collaboration with diverse teams to implement engineering best practices, enhance monitoring and automation, and explore opportunities to integrate emerging AI technologies into operations.
Infrastructure as Code: Build and maintain automated infrastructure provisioning using Terraform for hybrid cloud setups.
Cloud Management: Design and manage multi-cloud environments leveraging AWS and Azure, with an emphasis on optimizing Kubernetes clusters (EKS and AKS).
Oracle E-Business Suite (EBS): Ensure the reliability and optimization of Oracle EBS deployments while integrating them with other IT systems.
Operating Systems Administration: Manage and optimize Linux (RHEL) and Windows Server environments for high availability and security.
Application Performance: Collaborate with development teams to improve the reliability and performance of applications built with React, Node.js, .NET, C#, and Java.
Networking & Security: Implement secure, scalable architectures using advanced AWS networking techniques such as VPC design, load balancing, and routing.
Database Management: Monitor database performance and manage both relational (e.g., MySQL, PostgreSQL) and NoSQL (e.g., DynamoDB, MongoDB) databases for high-demand services.
Monitoring & Troubleshooting: Deploy observability tools to proactively identify performance issues using platforms like Prometheus, Grafana, Splunk, or CloudWatch.
Incident Response & Automation: Lead incident management efforts, conduct postmortem reviews, and drive automation initiatives to enhance system resilience.
Collaboration: Partner with developers, system administrators, and security teams to align infrastructure capabilities with business objectives.
Advanced proficiency in Terraform for automating infrastructure.
Hands-on experience managing Kubernetes clusters on Azure (AKS) and AWS (EKS).
Deep knowledge of AWS and Azure ecosystems, including networking, security, and cost management.
Expertise in Linux (RHEL) and Windows Server administration.
Proven ability to support and optimize Oracle E-Business Suite (EBS) in complex environments.
Application development experience with React, Node.js, .NET, C#, and Java.
Strong database administration skills for relational (MySQL, PostgreSQL) and NoSQL databases (DynamoDB, MongoDB).
Advanced networking expertise in areas like VPC design and hybrid cloud connectivity.
Familiarity with monitoring tools such as NewRelic, Prometheus, Grafana, Splunk, or CloudWatch.
Strategic mindset for designing scalable systems for high-demand platforms.
Strong collaboration skills to mentor teams in adopting SRE practices.
Excellent communication abilities for engaging technical and non-technical stakeholders.
Adaptability to thrive in fast-paced environments.
Experience with AI-driven operations (AIOps) for predictive maintenance and troubleshooting automation.
Background in high-demand or live-streaming applications.
Relevant certifications such as AWS Certified Solutions Architect or Azure Solutions Architect Expert.
Knowledge of compliance standards like SOC 2 or GDPR.
This is an exciting opportunity to work on cutting-edge systems that power critical business operations while collaborating with cross-functional teams to drive innovation.
Nike
Baxter International Inc.
10Folders
Social Innovation Canada
Hitachi