Match score not available

Senior Site Reliability Engineer

82% Flex
EXTRA HOLIDAYS - EXTRA PARENTAL LEAVE - WORK FROM HOME - FULLY FLEXIBLE
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Experience with CDN, Terraform, Datadog, Strong software development lifecycle understanding, Agile methodology experience, Problem-solving skills & ability to learn, Collaboration aptitude & communication skills.

Key responsabilities:

  • Configuring and maintaining CDN services
  • Designing strategies to mitigate abusive traffic
  • Developing solutions for bot traffic and unauthorized scraping
  • Improving system observability and security patching strategy
  • Participating in on-call rotation and agile workflow
Stack Overflow logo
Stack Overflow Computer Software / SaaS SME https://stackoverflow.com/
501 - 1000 Employees
See more Stack Overflow offers

Job description

Logo Jobgether

Your missions

Every developer has a tab open on Stack Overflow.  

We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. 

Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in leading with empathy; creating an environment where every Stacker knows they belong. We embrace that the unique contributions and points of view of all Stackers contribute to our success.

We are a Best Company to Work For, in addition to being recognized for Best Company Leadership, Best Company Happiness, Best Company Perks and Benefits, Best Company Work-Life Balance, Best Company Compensation, and Best Company Outlook.

We are a remote-first company with Hiring HUBs based in the US, Canada, UK, and Germany.

At Stack Overflow, our mission is to serve developers. We build products that make millions of developers’ lives better every day. Our goal is to create a community and a company where every developer feels welcome to learn, share their knowledge, and build their careers.

Stack Overflow is growing fast and our infrastructure needs keep getting bigger as our products 

scale and grow. With the increase in infrastructure needs, we are committed to ensuring the highest level of security and performance for our web properties. We are looking for a Site Reliability Engineer to join our existing team of SREs and developers to help us protect our platforms from malicious attacks, bot activities, and unauthorized data scraping. As an SRE, you will collaborate with application development teams to identify gaps and opportunities to improve security and reliability across our products, always looking for ways to automate manual work, and create repeatable, scalable systems and processes.

We’re looking for someone with a specialized skill set in web site anti-abuse such as DDoS protection and advanced bot mitigation strategies using CDN technologies like Cloudflare. We expect this person to have familiarity with Google and/ or Azure cloud platforms, and familiarity with the .NET ecosystem. We don’t expect you to know every other part of our stack coming in, so we’ll pair you with other members of the team to learn and develop your skills across our entire infrastructure (including our non-cloud stackoverflow.com infrastructure).  We operate in mixed Windows and Linux environments, and expect someone in this role to have experience with one environment and a working understanding of the other.

 

What you’ll work on:

  • Collaborate with other SREs, developers, Product teams, and our security team to configure and maintain CDN services such as WAF, DDoS protection, and Bot management using infrastructure as code.
  • Design, implement, and manage strategies to mitigate and block abusive traffic, such as DDoS attacks and fraudulent account creation ensuring high availability and reliability of our services.
  • Collaborate with other SREs, developers, and our Product teams to develop and deploy solutions to identify, block, and manage bot traffic.
  • Collaborate with other SREs, developers, and our Product teams to implement and maintain sophisticated anti-scraping technologies to protect our assets from unauthorized scraping efforts.
  • Improve the observability of our systems to help identify issues and preemptively detect potential threats by iterating on our monitoring and alerting strategies.
  • Reduce toil through software solutions by removing or automating manual tasks, steps, and workflows.
  • Improve our security patching and compliance strategy for cloud solutions.
  • Participate in our on-call rotation (typically 1 fortnight out of 4 months).
  • Partner closely with your peers to accomplish goals within an agile software development lifecycle.

Our current ecosystem includes:

  • Google Cloud Platform and/or Microsoft Azure
  • Self-hosted infrastructure
  • Cloudflare as our CDN
  • Terraform, PowerShell, Python, Golang
  • Windows Server, IIS, and .NET Core
  • Linux - CentOS, Debian
  • Our toolchain includes: GitHub, TeamCity (CI), GHA, Octopus Deploy, HAProxy / NGINX, ElasticSearch, Redis, Argo Workflows, Kubernetes, Datadog
Skills & Requirements

We’re looking for:

  • In depth experience with CDN-level traffic analysis and management, in particular with Cloudflare.
  • Experience with Terraform or similar IaC tools.
  • Familiarity with Datadog.
  • Experience writing mature software solutions in a high-level programming language (for example, but not limited to, Python, Golang, C#).
  • A strong practical understanding of software development lifecycle phases, from planning and development through production deployment and monitoring.
  • Experience with Agile methodologies such as Scrum, Extreme Programming, or Kanban.
  • Willingness to learn new technologies and adapt to changing priorities.
  • Excellent problem-solving skills with the ability to respond quickly to incidents.
  • Eagerness and ability to work with different types of functional groups, share knowledge, communicate, collaborate, and contribute. This is particularly important given our remote-first environment.

We like to see:

  • Expertise in scripting languages (eg: Bash, Powershell, Python)
  • SQL experience (Microsoft SQL Server a plus)
  • An understanding of service level indicators and service level objectives

What you’ll get in return:

  • Competitive Base Salary 
  • Generous paid vacation
  • Generous parental leave (16 weeks at 100% pay), family care leave, and unlimited sick days
  • Equity (RSUs) for all employees at all levels
  • Industry-leading health benefits that are applicable per country of residence for all our full-time employees
  • Company-paid Life Insurance
  • Home Internet stipend
  • Professional allocation for your growth and development
  • One-time allowance to assist with your home office setup
  • Company-paid access to Calm, Bravely, LinkedIn Learning, MyAcademy and Overdrive

Stack Overflow is proud to be an equal opportunity workplace. We value diversity, inclusion, equity and belonging and these pillars are at the heart of how we work together here at Stack. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. 

For individuals based in California, and other locations where required, we will consider employment qualified applicants with arrest and conviction records.

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Soft Skills

  • open-mindset
  • verbal-communication-skills
  • microsoft-office
  • Leadership

Site Reliability Engineer Related jobs