Technical Operations Engineer, Core

Work set-up: 
Full Remote
Contract: 
Salary: 
157 - 157K yearly
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Minimum of 5 years in Technical Operations or Site Reliability Engineering., Deep experience managing complex Web3 infrastructures, including RPC services and node operations., Proficiency in Linux/Unix system administration and troubleshooting., Hands-on experience with automation tools like Terraform, Ansible, and containerization with Docker and Kubernetes..

Key responsibilities:

  • Manage deployment and optimization of blockchain networks.
  • Troubleshoot high-impact Web3 incidents and coordinate with ecosystem partners.
  • Develop monitoring solutions using platforms like Grafana and DataDog.
  • Participate in on-call support to ensure system reliability.

QuickNode  logo
QuickNode Cybersecurity SME https://www.quicknode.com/
51 - 200 Employees
See all jobs

Job description

QuickNode is a cloudbased infrastructure company that powers the blockchain ecosystem.

Our mission is to be the indispensable utility that empowers companies and innovators globally to build nextgeneration, Web3 enabled businesses & applications using blockchain technology. QuickNode is backed by some of the worlds best investors including Tiger Global, Y Combinator, SoftBank, and the Seven Seven Six Fund. The QuickNode team has over 120 people maintaining high performance global data infrastructure for amazing customers serving billions of requests daily.

We are a global remote company with an HQ in Miami, Florida.

The Role

We’re seeking a seasoned Technical Operations Engineer to ensure the stability, reliability, and performance of our production systems. In this key role, you’ll leverage deep technical expertise, particularly in Web3blockchain technologies, to manage, optimize, and enhance our platform infrastructure. You’ll drive operational excellence through proactive monitoring, meticulous incident management, innovative problemsolving, and collaborative crossteam initiatives.

What Youll Do
  • Blockchain Network Management: Lead the deployment, optimization, and operational management of new blockchain networks. Conduct thorough testing, benchmarking, and continuous improvement of chain reliability and performance.

  • Complex Web3 Issue Resolution: Address highimpact Web3 incidents through rigorous troubleshooting, detailed log analysis, JSONRPC response debugging, and direct coordination with blockchain foundations and ecosystem partners.

  • Proactive System Monitoring: Develop and maintain comprehensive monitoring and alerting solutions using advanced dashboards (e.g., Grafana, DataDog), identifying trends, anomalies, and performance bottlenecks before they become critical.

  • Incident & SLO Management: Define, implement, and enforce servicelevel objectives (SLOs) and agreements (SLAs), ensuring measurable standards of system reliability and performance are consistently met.

  • Automation & Optimization: Implement and maintain automation solutions (Ansible, Terraform, Kubernetes) to streamline deployments, reduce manual tasks, and optimize cloud infrastructure cost and efficiency.

  • Technical Collaboration: Actively collaborate with Tier1 support, infrastructure, and development teams, ensuring alignment on system improvements, rapid issue resolution, and operational knowledge sharing.

  • OnCall Support: Participate in a rotating 247 oncall schedule to swiftly address critical system incidents, maintain continuous service delivery, and uphold customer trust.

    • What Youll Bring
      • Minimum of 5 years in Technical Operations, Site Reliability Engineering (SRE), or related roles. Proven LinuxUnix system administration and advanced troubleshooting capabilities.

      • Deep experience managing complex Web3 infrastructures (RPC services, validator setups, node operations). Skilled in interpreting blockchain logs, JSONRPC responses, and debugging intricate Web3 protocol issues.

      • Solid handson experience with configuration management and infrastructure automation tools (Helm, Terraform, Ansible, Consul), including containerization expertise (Docker, Kubernetes), managing and scaling services in cloud environments.

      • Competency in scriptingprogramming languages (Python, Go, JavaScript).

      • Advanced proficiency in monitoring and analytics platforms (Grafana, DataDog), enabling proactive and datadriven operational decisionmaking.

      • Demonstrated ability to identify performance patterns, forecast potential issues, and implement preventive solutions.

      • Strong track record defining, measuring, and maintaining SLAsSLOs, and experienced with incident response tooling and processes (PagerDuty), ensuring quick resolution and systematic rootcause analyses.

      • Willing to travel on a limited basis for conferences, offsites andor meetings, generally less than 10 days per year.

      • Exceptional interpersonal and communication skills, with a proven ability to collaborate effectively across multiple teams and stakeholders.

      • Selfmotivated, solutionoriented, and consistently striving for operational improvements, quality enhancements, and reduced technical debt.

      • Solid professional attributes, committed to transparency, accountability, and ethical behavior. Capable of managing complexity and staying adaptable under pressure, and able to demonstrate continuous learning and comfort evolving within a rapidly changing technical landscape.

      • Selfstarter driven by curiosity and initiative, proactively identifying opportunities, addressing gaps, and implementing solutions autonomously.

      • Thrives in dynamic environments and committed to maintaining industry leadership through close collaboration with the most innovative and talented minds in Web3.

        • Performance Metrics

          Success in this role will be measured by:

          1. Proactively monitor, rapidly respond to, and diligently resolve highseverity platform incidents during oncall and shift hours, ensuring ≥99.99% uptime (less than 4min 30sec downtime per month) across all Core Platform RPC services and validators.

          2. Actively seek opportunities to enhance operational efficiency through automation and streamlined processes, aiming to automate a minimum of two critical operational tasks or deployments per quarter, resulting in at least a 25% reduction in manual interventions and measurable improvements in deployment velocity.

          3. Autonomously tackle research, rapid operationalization, and rigorous maintenance for new L1L2 chain deployments, achieving stable production readiness within 14 days, proactively ensuring ≥99.99% uptime postlaunch, and effectively onboarding initial traffic for shared and public endpoint services.

            1. The US base salary range and level for this position are $156,510 $`73,900 per year and level P3. International ranges, in local currency, will be discussed during the hiring process with applicable candidates. This role is eligible for a quarterly bonus tied to company and individual goal achievement. We consider years of experience, level of proficiency in job function, the technical competencies required and location when determining base salary ranges for positions and levels.

              The QuickNode compensation philosophy includes pillars to ensure fair and unbiased compensation for all employees. To design and deliver total reward offerings that are employeecentric. To offer a competitive benefit package in all locations where we operate. To prioritize attracting and retaining the best talent globally. To maintain a highperforming and flexible way of working.

              During the hiring process, we are committed to discussing compensation openly and honestly. We encourage candidates to share their salary expectations and requirements early, allowing for an individualized discussion. We know that our total rewards practices impact the lives and wellbeing of our employees. Therefore, we will never stop learning about the market, our business, your needs, and how best to achieve our goals through thoughtful and datadriven practices. If you have any questions or require further information about the compensation for this position, please dont hesitate to reach out to your Recruiter.

              We at Quicknode are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Cybersecurity
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)
  • Collaboration
  • Adaptability
  • Communication
  • Self-Motivation
  • Problem Solving

Technical Support Engineer Related jobs