Job Description
We are seeking a skilled Cloud Engineer/Site Reliability Engineer (SRE) to join our team. This role involves building, maintaining, and operating IaaS and PaaS infrastructure in both Azure commercial and government clouds. The ideal candidate will work closely with development teams to identify and measure Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs).
Key Responsibilities
- Infrastructure Management: Build, maintain, and operate IaaS and PaaS infrastructure in Azure commercial and government clouds.
- Collaboration: Work closely with development teams to identify and measure SLOs, SLAs, and SLIs.
- Platform Services Development: Contribute to the development of platform services, including architecture, provisioning, configuration, deployment, and support.
- Integration: Perform integrations with central logging, metrics dashboards, instrumentation, incident monitoring, and management systems.
- Tool Administration: Build, integrate, and administer systems and tools that enable engineering teams to observe their applications in production autonomously (e.g., dashboards, APMs).
- On-Call Support: Provide support for software and cloud infrastructure on an on-call rotation basis.
- Problem Remediation: Assist with identifying and remediating technical problems at the root cause by continuously implementing automation, self-healing, and real-time monitoring to production systems.
- Operational Tooling: Maintain and improve operational tooling and frameworks.
- Performance Testing: Build frameworks to test the performance and resiliency of platform services and tools.
- Automation: Automate alerts for metrics on performance, cost, vulnerabilities, risk, and compliance violations.
- Process Improvement: Improve processes and champion the automation of any manual support tasks.
Skills
- Cloud Platforms: Expertise in cloud platforms, particularly Azure and AWS.
- Kubernetes: Hands-on production experience with Kubernetes (bare metal or managed) cluster setup and management.
- Infrastructure as Code (IaC): Experience with IaC tools like Terraform and Pulumi.
- Deployment Tools: Proficiency with Kubernetes deployment tools such as Helm, ArgoCD, and Flux.
- Networking: Strong awareness of networking and internet protocols.
- Identity and Access Management (IAM): Understanding of IAM principles and practices.
- Production Support: Experience supporting infrastructure in production cloud environments.
- Security: Knowledge of encryption, Public Key Infrastructure (PKI), and understanding of OWASP.
- RESTful Services: Experience working with RESTful services.
- Monitoring Tools: Familiarity with monitoring tools like Azure Monitor, Splunk, Dynatrace, Grafana, and Prometheus.
- Development Tools: Familiarity with IDEs and source control tools like Visual Studio Code and Git.
Additional Skills & Qualifications
- Experience: 4+ years in a cloud engineer/SRE role.
- Cloud Expertise: Expert knowledge of a cloud service provider.
- Automation and Self-Healing: Strong focus on automation and self-healing practices.
- Team Player: Ability to work collaboratively within a team.
- Self-Learner: Proactive in learning and staying updated with the latest technologies and best practices.
This role offers an exciting opportunity to work with cutting-edge technologies and contribute to the development and maintenance of robust cloud infrastructure. If you have the required skills and experience, we encourage you to apply.
Benefits
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:
Medical, dental & vision
Critical Illness, Accident, and Hospital
401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
Life Insurance (Voluntary Life & AD&D for the employee and dependents)
Short and long-term disability
Health Spending Account (HSA)
Transportation benefits
Employee Assistance Program
Time Off/Leave (PTO, Vacation or Sick Leave)
About TEKsystems
We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.
The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.