Offer summary

Qualifications:

5+ years of experience in a similar role, Deep experience with AWS, including EKS, IAM, VPC, EC2, S3, and RDS, Solid Terraform experience, Proficiency in scripting languages such as Bash, Python, or Go for automation tasks..

Key responsabilities:

Manage and support AWS Kubernetes-based infrastructure ensuring reliability, security, and scalability

Set up and maintain monitoring for system performance and implement alerting strategies

Enhance and maintain CI/CD pipelines to improve deployment efficiency

Implement cost monitoring and optimization for AWS infrastructure.

Job description

OPEN ONLY TO CLUJ-NAPOCA

FULLY REMOTE

ONLY OPEN TO PFA/SRL COLLABORATIONS

Compensation will be determined based on your performance after the technical evaluation

ABOUT THE ROLE

Looking for a Senior DevOps & Infrastructure Engineer to join a fast-growing startup focused on developing innovative casino video slot games. They offer a diverse portfolio of games, including virtual soccer betting and crash games.

As a trusted game provider, they partner with online gaming platforms to deliver high-quality, engaging gaming experiences to players worldwide.

DUTIES AND RESPONSIBILITIES

As a DevOps & Infrastructure Engineer you will manage and support our partner's AWS Kubernetes-based infrastructure ensuring system reliability, security, and scalability, while supporting CI/CD pipelines, monitoring and incident response.

This role may evolve into multiple positions based on workload and requirements.

Infrastructure Management
• Maintain and update existing AWS Kubernetes infrastructure;
• Perform version upgrades for Kubernetes, Terraform, and other core infrastructure
components;
• Ensure high availability and reliability of the system through proactive monitoring and
scaling;
• Manage Nginx configurations, including load balancing and sticky sessions;

Monitoring & Observability
• Set up and maintain monitoring for CPU, memory, and disk usage;
• Monitor and maintain logging and metrics using industry-standard tools;
• Define and implement alerting strategies for infrastructure and services;
• Monitor external dependencies such as exchange rates, Auth0, and third-party APIs;

Security & Access Control
• Manage user access across different services, ensuring proper permissions and least
privilege principles;
• Secure infrastructure against threats by implementing best practices (network security,
IAM policies, vulnerability management);
• Rotate API keys and credentials regularly;
• Define and enforce security policies for cloud infrastructure and deployments;

CI/CD & Release Management
• Enhance and maintain CI/CD pipelines to improve deployment efficiency;
• Assist in the release process, monitoring and signing off on production deployments;
• Ensure infrastructure changes are version-controlled and follow best practices;

Incident Management & Response
• Set up and manage incident response processes, including escalation procedures;
• Define SLAs for system availability and response times;
• Conduct root cause analysis (RCA) and implement preventative measures;

Cost Management & Optimization
• Implement cost monitoring and reporting for AWS infrastructure;
• Set up alerts for cost anomalies to prevent unexpected budget overruns;
• Optimize cloud resources for cost efficiency;

Provisioning & Environment Management
• Automate provisioning of environments for platform integrations;
• Maintain consistency across different environments (development, staging, production);
• Implement infrastructure as code (IaC) best practices using Terraform;

Internal Tools Management
• Manage and support internal tools such as Slack, Office 365, Jira, Confluence, and
GitHub;
• Ensure seamless integration of internal tools with the overall infrastructure;
• Maintain security and access control for internal collaboration tools;

REQUIREMENTS

Mandatory experience

5+ years of experience in a similar role;
Deep experience with AWS, including EKS (Kubernetes), IAM, VPC, EC2, S3, and RDS;
Solid Terraform experience;
Experience with configuring Nginx, handling sticky sessions;

Other required experience

Experience with setting up and managing CI/CD workflows (GitHub Actions, GitLab CI/CD, or similar);
Hands-on experience with Prometheus, Grafana, ELK stack, CloudWatch, or similar tools;
Experience in cloud security, IAM, and secret management;
Ability to set up and manage incident handling processes;
Proficiency in scripting languages (Bash, Python, Go) for automation tasks;
Understanding of cloud networking, security groups, firewalls, VPNs;
Experience in cost tracking and optimization for cloud environments;
Experience in managing DNS, caching, and security settings using Cloudflare;
Experience in managing and optimizing PostgreSQL databases;
Strong command-line skills and ability to manage Linux-based systems;
Experience in handling automated SSL certificate issuance and renewal;