Match score not available

Engineer II - Site Reliability (AUS Remote)

extra holidays - extra parental leave - work from home

Remote:

Full Remote

Contract:

Full time

Experience:

Mid-level (2-5 years)

Work from:

Australia

Offer summary

Qualifications:

Bachelor's degree in Computer Science, 3+ years experience in large-scale production environment, Proficiency in Java, Python, or Go.

Key responsabilities:

Ensure operational aspects of cloud platform
Develop automation for mission critical solutions
Troubleshoot server hardware issues
Gather and analyze system metrics
Lead incident analysis and resolution

CrowdStrike Cybersecurity Large https://www.crowdstrike.com/

5001 - 10000 Employees

See more CrowdStrike offers

Job description

#WeAreCrowdStrike and our mission is to stop breaches. As a global leader in cybersecurity, our team changed the game. Since our inception, our market leading cloud-native platform has offered unparalleled protection against the most sophisticated cyberattacks. We’re looking for people with limitless passion, a relentless focus on innovation and a fanatical commitment to the customer to join us in shaping the future of cybersecurity. Consistently recognized as a top workplace, CrowdStrike is committed to cultivating an inclusive, remote-first culture that offers people the autonomy and flexibility to balance the needs of work and life while taking their career to the next level. Interested in working for a company that sets the standard and leads with integrity? Join us on a mission that matters - one team, one fight.

About the Role:
CrowdStrike is looking to hire an Engineer II to the TechOps SRE team that will have a focus on our Commercial Cloud. We’re looking for a deeply-technical, hands-on engineer, who loves to develop automation and tooling through software to ensure delivery of mission critical solutions and services for large-scale distributed systems.

What You'll Do:

Have expertise with Linux engineering and administration for thousands of bare metal
servers and virtual machines
Be responsible for all operational aspects of our platform - Availability, Latency,
Throughput, Monitoring, Issue Response (analysis, remediation, deployment) and
Capacity Planning with respect to Latency and Throughput
Work in a team of highly motivated engineers distributed across the globe
On-call rotation with other team members
Troubleshoot server hardware issues
Use your passion for technology to ensure our platform operates 24x7
Obsess about learning, and champion the newest technologies & tricks with others,
raising the technical IQ of the team. We don’t expect you to know all the technology we
use but you will be able to get up to speed on new technology quicklyHave broad exposure to our entire architecture and become one of our experts in our
overall process flow
Have an intrinsic drive to make things better
Bias towards small development projects and the occasional larger projects
Have experience with modern monitoring and telemetry stacks (ELK, Prometheus,
Grafana, Zabbix)
Gather and analyze metrics from both operating systems and applications to assist in
performance tuning and fault finding
Ability to lead incident analysis for incidents, champion incident response practices and
assist in correlating incidents to systemic problems, and drive towards resolution.

What You'll Need:

Bachelor's degree and/or equivalent experience in Computer Science
A minimum of three years of experience working in a large scale production environment
Experience writing moderate to complex scripts and programs for automation, tools,
frameworks, dashboards, and alarms
Experience in one or more of: Java, Python, Go
Experience with storage technologies (Examples: SAN, NAS, NFS, Object Storage,
FreeNAS, iSCSI)
Experience with Infrastructure technologies (Examples: Linux, Windows, VMware,
Docker, Kubernetes, etc.)
Experience writing technical documentation
Configuration management experience with one or more tools such as Puppet, Chef,
Ansible
Solid understanding of application design, including operational trade-offs of various
designs
Analytical skills coupled with a strong sense of urgency, ownership, and drive
Ability to work with well in a diverse, team-focused environment with other SREs and
Engineers
Ability to broadly communicate and present recommended conventions defined by the
reliability team broadly

#LI-TH1
#LI-Remote

Benefits of Working at CrowdStrike:

Remote-first culture
Market leader in compensation and equity awards with option to participate in ESPP in eligible countries
Competitive vacation and flexible working arrangements
Physical and mental wellness programs
Paid parental leave, including adoption
A variety of professional development and mentorship opportunities
Access to CrowdStrike University, LinkedIn Learning and Jhanna
Offices with stocked kitchens when you need to fuel innovation and collaboration
Birthday time-off in your local country
Work with people who are passionate in our mission and Great Place to Work certified across the globe

We are committed to fostering a culture of belonging where everyone feels seen, heard, valued for who they are and empowered to succeed. Our approach to cultivating a diverse, equitable, and inclusive culture is rooted in listening, learning and collective action. By embracing the diversity of our people, we achieve our best work and fuel innovation - generating the best possible outcomes for our customers and the communities they serve.

CrowdStrike is committed to maintaining an environment of Equal Opportunity and Affirmative Action. If you need reasonable accommodation to access the information provided on this website, please contact Recruiting@crowdstrike.com, for further assistance.