Engineer II- Site Reliability (Remote, AUS)

extra holidays - extra parental leave - work from home
Work set-up: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree or equivalent in Computer Science., At least three years of experience in large-scale production environments., Proficiency in scripting languages like Java, Python, or Go., Experience with Linux, storage, and infrastructure technologies..

Key responsibilities:

  • Manage operational aspects of the platform, including availability and monitoring.
  • Troubleshoot server hardware issues and ensure 24/7 platform operation.
  • Develop automation tools and contribute to incident analysis and resolution.
  • Collaborate with a global team of engineers to improve system reliability.

CrowdStrike logo
CrowdStrike Cybersecurity Large http://www.crowdstrike.com
5001 - 10000 Employees
See all jobs

Job description

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About the Role

CrowdStrike is looking to hire an Engineer II to the TechOps SRE team that will have a focus on our Commercial Cloud.  We’re looking for a deeply-technical, hands-on engineer, who loves to develop automation and tooling through software to ensure delivery of mission critical solutions and services for large-scale distributed systems.  

What You'll Do:

  • Have expertise with Linux engineering and administration for thousands of bare metal servers and virtual machines

  • Be responsible for all operational aspects of our platform -  Availability, Latency, Throughput, Monitoring, Issue Response (analysis, remediation, deployment) and Capacity Planning with respect to Latency and Throughput

  • Work in a team of highly motivated engineers distributed across the globe

  • On-call rotation with other team members

  • Troubleshoot server hardware issues

  • Use your passion for technology to ensure our platform operates 24x7

  • Obsess about learning, and champion the newest technologies & tricks with others, raising the technical IQ of the team. We don’t expect you to know all the technology we use but you will be able to get up to speed on new technology quickly

  •  Have broad exposure to our entire architecture and become one of our experts in our overall process flow

  • Have an intrinsic drive to make things better

  • Bias towards small development projects and the occasional larger projects

  • Have experience with modern monitoring and telemetry stacks (ELK, Prometheus, Grafana, Zabbix)

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding

  • Ability to lead incident analysis for incidents, champion incident response practices and assist in correlating incidents to systemic problems, and drive towards resolution.
     

What You'll Need:

  • Bachelor's degree and/or equivalent experience in Computer Science

  • A minimum of three years of experience working in a large scale production environment

  • Experience writing moderate to complex scripts and programs for automation, tools, frameworks, dashboards, and alarms

  • Experience in one or more of: Java, Python, Go

  • Experience with storage technologies (Examples: SAN, NAS, NFS, Object Storage, FreeNAS, iSCSI)

  • Experience with Infrastructure technologies (Examples: Linux, Windows, VMware, Docker, Kubernetes, etc.)

  • Experience writing technical documentation

  • Configuration management experience with one or more tools such as Puppet, Chef, Ansible

  • Solid understanding of application design, including operational trade-offs of various designs

  • Analytical skills coupled with a strong sense of urgency, ownership, and drive

  • Ability to work with well in a diverse, team-focused environment with other SREs and Engineers

  • Ability to broadly communicate and present recommended conventions defined by the reliability team broadly

#LI-AR1

#LI-Remote

Benefits of Working at CrowdStrike:

  • Remote-friendly and flexible work culture

  • Market leader in compensation and equity awards

  • Comprehensive physical and mental wellness programs

  • Competitive vacation and holidays for recharge

  • Paid parental and adoption leaves

  • Professional development opportunities for all employees regardless of level or role

  • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections

  • Vibrant office culture with world class amenities

  • Great Place to Work Certified™ across the globe

CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
Cybersecurity
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork
  • Communication
  • Analytical Skills

Site Reliability Engineer (SRE) Related jobs