Match score not available

Sr Chaos Engineer

Remote:

Full Remote

Experience:

Senior (5-10 years)

Work from:

India, Brazil, Anywhere

Offer summary

Qualifications:

Bachelor's degree in IT or Computer Science., 5+ years of experience in Chaos Engineering., Strong knowledge of AWS services and distributed systems., Fluent in English communication..

Key responsabilities:

Design chaos experiments to test resilience.
Develop and execute fault injection tests on AWS.

Meta IT North America Large https://www.metait.ca/

1001 - 5000 Employees

See all jobs

Job description

We are looking for a highly skilled Senior Chaos Enginner and we are delighted that you want to be part of our team!

We're consistently choosing to help customers overcome their IT challenges providing consulting expertise to support IT strategy, outsourced operations, staff augmentation and digital transformation for companies such as ArcelorMittal, Air Liquide, Volvo Group, MLSE and many more. Take a look at our website, you can see some of the exciting work we are doing: https://www.metait.ca/

Why build your carrer at Meta?

We offer autonomy, clear goals and a dynamic and challenging environment, where professionals have the opportunity to interact with different technologies, participate in all types of projects, bring new ideas and work from anywhere in Brazil and (why not?) anywhere in the world. In addition, we are one of the best companies to work for in Brazil according to Great Place to Work and one of the 10 fastest growing technology companies in the country for 3 consecutive years, according to Anuário Informático Hoje.

Key Responsibilities

Chaos Engineering & Resilience Testing: Design and implement chaos experiments to simulate failure scenarios, stress test the system and uncover potential weaknesses within our infrastructure and applications.
AWS Infrastructure Expertise: Work with AWS services such as EC2, S3, RDS, and Lambda to develop and execute fault injection tests, testing the resilience of the cloud environment against real-world failure scenarios.
Distributed Systems & Data Reliability: Utilize experience with distributed databases (e.g., Cassandra) and data streaming platforms (e.g., Kafka) to test the fault tolerance and reliability of critical data processes.
Monitoring & Observability: Leverage monitoring tools (e.g., Prometheus, Grafana, ELK stack) to gain insights into system health and performance, analyzing metrics to detect and diagnose issues.
Networking Fundamentals: Apply knowledge of TCP/IP networking principles to simulate and understand network-related failures and their impact on system resilience.
Collaboration with Performance Team: Work closely with the Performance Test team to design experiments aligned with performance goals, share insights, and develop strategies to improve overall system resilience.
Containerization & Orchestration (Plus): Utilize container orchestration tools such as Kubernetes, Docker, and AWS Fargate to run and manage chaos tests in containerized environments.

Skills And Qualifications

Education: Bachelor's degree in information technology, Computer Science, or related field.
Advanced/Fluent English communication.
5+ years of experience in Chaos Engineering, Site Reliability Engineering, or related roles, with a strong focus on distributed systems.
In-depth knowledge of AWS services and architecture, including EC2, S3, RDS, and Lambda.
Proficiency with distributed systems, and experience with databases like Cassandra and data streaming platforms such as Kafka.
Strong understanding of monitoring and observability tools, particularly Prometheus, Grafana, and the ELK stack.
Networking fundamentals, with practical knowledge of TCP/IP principles in distributed environments.
(Preferred) Familiarity with containerization and orchestration (e.g., Kubernetes, Docker, AWS Fargate).
Analytical and problem-solving mindset, excellent communication skills, and a collaborative approach to working within cross-functional teams.

About The Role

Full-time job - 40h/week.
Independent Contractor – Long term
Time zone: CST (Overlap needed)
Localization: Remote in India

How the process work?

After applying you will receive an invitation by email to complete a technical test on Coodesh, you'll need to score over 80% to advance in the process.

The secong step will be an interview with Meta IT’s HR team followed by an interview with your future Manager here at Meta, and if approved, a final interview directly with the client’s team.

To apply for this position, please submit your resume highlighting your relevant experience and why you are interested in this role. Only shortlisted candidates will be contacted for further evaluation. Thank you for considering this opportunity with our company.

What are our values?

We are people serving people
We all think and act like owners
We are hungry for performance
We grow and learn together
We pursue excellence and simplicity
We have innovation and creativity in our DNA

All people are welcome regardless of their condition, disability, ethnicity, religious belief, sexual orientation, appearance, age or others. We want you to grow up with us in a welcoming environment full of opportunities.

Did you relate? Then, Come #BeMeta!