Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Since 1993, EPAM Systems, Inc. (NYSE: EPAM) has used its software engineering expertise to become a leading global provider of digital engineering, cloud and AI-enabled transformation services, and a leading business and experience consulting partner for global enterprises and ambitious startups. We address our clients’ transformation challenges by fusing EPAM Continuum’s integrated strategy, experience and technology consulting with our 30+ years of engineering execution to speed our clients’ time to market and drive greater value from their innovations and digital investments.
We make GenAI real with our AI LLM orchestration, testing and engineering solutions, EPAM DIAL, EPAM EliteA™ and EPAM AI/RUN™, respectively.
We deliver globally, but engage locally with our expert teams of consultants, architects, designers and engineers, making the future real for our clients, our partners and our people around the world.
We believe the right solutions are the ones that improve people’s lives and fuel competitive advantage for our clients across diverse industries. Our thinking comes to life in the experiences, products and platforms we design and bring to market.
Added to the S&P 500 and the Forbes Global 2000 in 2021 and recognized by Glassdoor and Newsweek as a Top 100 Best Workplace, our multidisciplinary teams serve customers across six continents. We are proud to be among the top 15 companies in Information Technology Services in the Fortune 1000 and to be recognized as a leader in the IDC MarketScapes for Worldwide Experience Build Services, Worldwide Experience Design Services and Worldwide Software Engineering Services.
Learn more at www.epam.com.
We are seeking an experienced Senior Observability DevOps Engineer to join our dynamic team.
In this role, you will be responsible for managing our AWS infrastructure, enhancing our observability services, and automating operations while focusing on efficiency and scalability. Ideal candidates will demonstrate a robust understanding of DevOps principles and observability tools, and have experience automating infrastructure and handling large-scale environments.
Responsibilities
Manage AWS infrastructure using Terraform and CloudFormation, including tasks like EKS version upgrades, blue/green deployments, and scaling
Set up, tune, and modernize various observability services including Cortex/Mimir, Loki, Tempo, OpenTelemetry, Grafana, and Alertmanager
Automate operations programmatically using Python or Golang and Gitlab CI, plus develop custom self-service solutions based on AWS Service Catalog
Build Docker images for multiple architectures including arm64 and amd64
Troubleshoot issues related to microservices in Kubernetes, AWS connectivity, service performance, Lambda functions, and Kafka
Participate in hypercare events and on-call shifts
Requirements
Proficiency in version control using Git, GitHub, and GitLab alongside CI/CD pipelines
Strong experience with Infrastructure as Code (IaC) tools like Terraform and CloudFormation for automation
Expertise in Grafana, including logs, traces, and metrics alongside familiar usage with Tempo, Mimir (Prometheus), Loki, Datadog, and NewRelic
Competency in AWS cloud services including S3, IAM, tagging, load balancers, Lambda, and EKS (Kubernetes)
Skills in programming with Python
Background in ITIL processes covering knowledge, incident, and problem management
Qualifications in observability concepts and strategies for signal ingestion and billing reduction
Nice to have
Familiarity with Cortex and Tempo
Understanding of Promtail/FluentBit and Elasticsearch
Flexibility to use Kafka and Golang when needed
We offer
Career plan and real growth opportunities
Unlimited access to LinkedIn learning solutions
International Mobility Plan within 25 countries
Constant training, mentoring, online corporate courses, eLearning and more
English classes with a certified teacher
Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
Flexible work schedule and dress code
Collaborate in a multicultural environment and share best practices from around the globe
Hired directly by EPAM & 100% under payroll
Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
13 % employee savings fund, capped to the law limit
Grocery coupons
30 days December bonus
Employee Stock Purchase Plan
12 vacations days plus 4 floating days
Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
Monthly non-taxable amount for the electricity and internet bills
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Required profile
Experience
Level of experience:Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.