Sr. Site Reliability Engineer (SRE)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's Degree in Computer Science, Design, or related field., 5-7 years of experience in a Site Reliability Engineering role or similar., Deep technical expertise in AWS, containerization, monitoring, and automation tools., Strong communication skills in English, both written and verbal..

Key responsibilities:

  • Ensure system reliability and monitor system health.
  • Build and improve platform infrastructure and applications.
  • Collaborate with development teams to enhance services and release processes.
  • Optimize system performance and drive continuous improvement.

NTD Software logo
NTD Software Startup https://ntdsoftware.com/
11 - 50 Employees
See all jobs

Job description

We are looking for a Senior Site Reliability Engineer with strong experience in AWS, system monitoring, and infrastructure automation. The role involves maintaining and improving the reliability and performance of a cloud-based lending platform used by mid-market and large financial institutions. 

The ideal candidate will have a solid background in systems engineering and software development, be comfortable working across teams, and take ownership of operational stability and tooling improvements.

Responsibilities:
  • Own your deep learning about the software, its functions, and how it fulfills the clients’ needs, and how they use the product. 
  • Oversee systems to ensure reliability for customers. 
  • Monitor distribution systems and notify appropriate persons of any potential issues. 
  • Run the production environment by monitoring availability and taking a holistic view of system health. 
  • Build software and systems to manage platform infrastructure and applications. 
  • Improve reliability, quality, and time-to-market of our suite of software solutions. 
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve. 
  • Partner with development teams to improve services through rigorous testing and release procedures.

  • Technical Skills:
  • Bachelor's Degree (B.A.) in Computer Science or Design or equivalent four-year degree, or equivalent related experience. 
  • 5-7 years of proven experience in a Site Reliability role or similar experience. 
  • Excellent oral and written communication skills, including facilitation of group presentations,  and consulting skills in the English language. 
  • Possess deep technical experience with AWS, containerization technologies, automated deployment frameworks, monitoring, logging, alerting, system internals, networking,  databases, distributed systems, and service-oriented architecture. 
  • Demonstrate hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges. 
  • Experience working with Infrastructure and Application Monitoring tools such as: New Relic,  SumoLogic, Uptime monitoring (Pingdom), CloudTrail, CloudWatch Insights, CloudFormation, CodePipeline, CodeDeploy. 
  • Extensive working knowledge of managing AWS and Linux OS. 
  • Experience working with MSSQL, MySQL, in cloud-based environments, as well as demonstrable knowledge and experience of AWS service technologies, i.e., Aurora, MySQL.  
  • Experience of working with NoSQL database technologies (ideally DynamoDB). 
  • Experience of working with pipeline automation scripting and tooling, i.e., Jenkins, Terraform. 
  • Knowledge and experience utilizing coding languages (e.g., C++, Java, PHP) and frameworks/systems (e.g., AWS). 
  • Ability to learn new languages and technologies strongly preferred. 
  • Broad understanding of the lending industry, with the ability to become a subject matter expert on the job.

  • Soft Skills:
  • A strong sense of ownership. 
  • Excellent written and verbal communication and interpersonal skills. 
  • Able to effectively collaborate with technical and business partners. 
  • Can take on full projects from beginning to end. 
  • Problem solver. 
  • Team Player. 
  • Advanced English level.
  • Required profile

    Experience

    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Collaboration
    • Communication
    • Teamwork
    • Problem Solving

    Site Reliability Engineer (SRE) Related jobs