This is a remote position.
About Awign Expert:
Awign Expert is an enterprise-focused platform that helps businesses Hire, Assess and Manage highly skilled resources for Gig Based Projects. We provide our Experts a gateway to work for and build a freelance/consulting career with large-scale Enterprises. We are a newly launched business division of Awign, which is one of the pioneers and currently the largest player in the Gig Economy in India. Here at Awign, we are changing how the world works with a vision to uplift millions of Careers.
About the client -
This company is a leading enterprise mobile app development firm, specializing in delivering highly efficient, secure, and scalable applications to a global audience. They offer end-to-end design and development services, collaborating closely with clients to build scalable, user-centric, and innovative solutions. Their skilled designers and developers create engaging user experiences while leveraging cutting-edge technologies to ensure seamless functionality.
Job Title: Cloud Reliability Engineer (CRE)
Location: Offshore
Job Description:
We are seeking Cloud Reliability Engineers (CREs) to support Carnival Cruise Line cloud infrastructure. The ideal candidates will focus on automating cloud operations, improving system reliability, and ensuring seamless observability and monitoring across the Carnival Cruise Line environment.
The CRE team will be responsible for designing, implementing, and maintaining automation frameworks, monitoring systems, and log-mining solutions to enhance cloud operations. The role will also involve provisioning, fault management (FM), and optimizing cloud infrastructure for high availability and performance.
Key Responsibilities:
Automation & Cloud Operations: Develop and implement automation scripts and tools to streamline cloud operations and provisioning.
Observability & Monitoring: Design and enhance observability frameworks, including real-time monitoring, log mining, and alerting systems for proactive issue detection.
Infrastructure Reliability: Improve cloud infrastructure reliability through performance tuning, capacity planning, and automated remediation strategies.
Fault Management (FM): Implement fault management processes to detect, diagnose, and resolve cloud infrastructure issues efficiently.
Data Farms & Log Analysis: Leverage data analytics and log mining techniques to gain insights into system performance and troubleshoot anomalies.
Provisioning & Deployment: Automate cloud provisioning and infrastructure-as-code (IaC) practices for efficient deployment across Carnival Cruise Lines' brands.
Collaboration & Best Practices: Work closely with development, security, and operations teams to enforce best practices for cloud reliability and scalability.
Required Skills & Experience:
Experience in Cloud Operations & Automation (AWS, Azure and GCP)
Proficiency in Infrastructure as Code (IaC) (Terraform, Azure CloudFormation, Ansible, Chef, Puppet, Azure Resource Manager)
Strong expertise in observability tools (Prometheus, Grafana, ELK Stack, Splunk, or Datadog)
Log Mining & Data Analytics (Kibana, Splunk, or BigQuery)
Fault Management & Incident Response experience in cloud environments
Experience with containerized environments (Docker, Kubernetes)
Proficiency in scripting & automation (Python, Bash, PowerShell)
Understanding of cloud security, networking, and cost optimization
Preferred Qualifications:
Certifications in Cloud Technologies (AWS Certified DevOps Engineer, Azure DevOps, Google Cloud Professional DevOps Engineer)
Experience in hybrid cloud environments (on-prem & cloud integration)
Hands-on experience with Site Reliability Engineering (SRE) practices
Experience in managing large-scale cloud infrastructure for enterprises
Hashicorp
Nagarro
CrowdStrike
Machine Learning Reply GmbH
Experian