Job description

About Penbrothers

Penbrothers is an HR & remote talent management partner and one of the fastest-growing companies in the Philippines. We provide talented Filipinos with global opportunities in high-growth startups and dynamic companies, from the comfort of their own homes.

About the Client

The client is a pioneer in medical recruitment, is seeking an experienced Tech Lead to drive their mission to enhance doctors' well-being. This is an opportunity to contribute your unique skills and expertise to create technology that truly matters, impacting lives on a daily basis

About the Role

We are looking for a Senior SRE/DevOps Specialist to play a vital role in ensuring the reliability of our Salesforce and web/mobile application environments. You will work closely with our engineers to continually improve and enhance our platform leaning towards world class best practices.

Service reliability and observability

Analysing resource utilization and forecasting capacity needs to ensure the system can handle expected traffic and workloads without performance issues.
Writing code and scripts to automate repetitive operational tasks, configuration management, and deployment processes to reduce human error and increase efficiency.
Managing changes to production systems and services, ensuring that new releases and configuration changes are rolled out with minimal disruption and risk.
Identifying and addressing performance bottlenecks, optimizing software and infrastructure to improve response times and reduce resource consumption.
Maintaining thorough documentation of systems, configurations, and incident response procedures to facilitate knowledge sharing and onboarding of new team members.
Defining and maintaining service level objectives that specify the acceptable level of service quality, such as uptime and latency, for a particular system or service.
Defining the key performance metrics and indicators that will be used to measure the system's performance and reliability, such as error rates and response times.
Designing and implementing monitoring systems to track the SLIs and using alerting mechanisms to notify the team when the system deviates from its defined SLOs.

Incident management & Disaster recovery planning

Responding to and mitigating incidents that impact service availability or performance,
following an incident management process, and conducting post-incident reviews to learn and improve.
Planning and implementing and executing disaster recovery and backup strategies to ensure data and service availability in case of failures or disasters.

Security

Ensure systems and infrastructure are securely configured and hardened by default
Manage secrets, credentials, and access controls across environments
Monitor for security-related events and support incident response efforts
Maintain secure CI/CD pipelines and enforce safe deployment practices
Planning and implementing disaster recovery and backup strategies to ensure data and service availability in case of failures or disasters.

Continuous Improvement

Continuously evaluating and improving system reliability, efficiency, cost optimization and automation to meet our evolving business needs and customer expectations.
Rationalizing, evaluating and integrating 3rd party developer tooling and services.
Troubleshooting platform issues with development teams
Providing tooling support and access management for development teams
Stay ahead of the tech curve, bringing new tools and frameworks to the table

Required profile