Match score not available

Monitoring and Logging Specialist (Subject Matter Expert) :: REMOTE

extra holidays - extra parental leave
Remote: 
Full Remote
Work from: 
Georgia (USA), United States

Offer summary

Qualifications:

Expertise in monitoring and logging tools, Strong knowledge of incident management systems, Familiarity with cloud platforms like AWS, Azure, Experience with security and compliance standards.

Key responsabilities:

  • Implement and manage monitoring tools
  • Manage and optimize logging platforms
  • Develop automated alerting and incident responses
  • Ensure compliance with logging best practices
ARK Solutions, Inc. logo
ARK Solutions, Inc. Human Resources, Staffing & Recruiting SME https://www.ARKSolutionsInc.com/
201 - 500 Employees
See more ARK Solutions, Inc. offers

Job description

Ark Solutions Inc is looking for Monitoring and Logging Specialist (Subject Matter Expert)!

Position: Monitoring and Logging Specialist (Subject Matter Expert)
Location: REMOTE
Duration: 12+ Months and possibility of extension

Description:

We are looking for SME experience on Monitoring and logging.
 
1. Implement and Manage Monitoring Tools:
Goal: Deploy and maintain a comprehensive monitoring system (e.g., Nagios, Zabbix, Prometheus) for infrastructure, applications, and network devices.
Objective: Ensure all critical components (servers, network, cloud services) are being monitored with real-time alerts and dashboards.
 
2. Manage and Optimize Logging Platforms:
Goal: Implement and optimize a logging solution (e.g., Splunk, ELK Stack, Graylog) to capture and store logs from various sources.
Objective: Ensure log data is properly indexed, stored, and searchable for troubleshooting and analysis.
 
3. Develop Automated Alerting and Incident Response:
Goal: Set up automated alerting rules and integrate them with incident management tools (e.g., PagerDuty, ServiceNow).
Objective: Ensure the team is notified of incidents promptly, with relevant logs and metrics available for swift troubleshooting.
 
4. Ensure Compliance with Logging and Monitoring Best Practices:
Goal: Enforce security, audit, and compliance standards in monitoring/logging solutions (e.g., ensure NIST, HIPAA compliance).
Objective: Continuously audit logging practices to ensure logs are maintained securely, and compliance standards are upheld.
 
5. Drive Proactive Monitoring and Predictive Analytics:
Goal: Implement proactive monitoring for system performance and use predictive analytics to identify potential issues before they occur.
Objective: Reduce downtime and improve system reliability by predicting and resolving bottlenecks or potential failures.
 
6. Facilitate Cross-Team Collaboration for Incident Resolution:
Goal: Enable effective collaboration between monitoring, infrastructure, and application teams during incidents by providing centralized monitoring data.
Objective: Shorten mean time to resolution (MTTR) during outages or incidents by ensuring all teams have access to relevant monitoring and logging information.
Goal: Train the team in using monitoring tools and ensure they understand how to read logs, set up alerts, and troubleshoot issues.
 
7. Train and Guide Team on Monitoring Tools:
Objective: Equip the team to efficiently use and maintain the monitoring and logging systems, reducing dependency (no silos) on specialists for day-to-day operations.
 
8. Optimize System Resource Utilization:
Goal: Regularly review and tune the monitoring and logging systems to ensure they are not overusing system resources (e.g., CPU, memory).
Objective: Ensure the monitoring system itself does not become a bottleneck or contribute to performance degradation.
 
9. Integrate Monitoring Solutions with Cloud Platforms:
Goal: Set up monitoring for cloud infrastructure and services (e.g., AWS, Azure) and integrate with existing tools.
Objective: Ensure seamless monitoring of both on-premises and cloud infrastructure with unified dashboards and alerts.
 
10. Document Monitoring/Logging Processes and Policies:
Goal: Maintain detailed documentation of monitoring configurations, incident response protocols, and logging system architecture.
Objective: Ensure the team can quickly onboard new members and continue operations smoothly in case of staff changes or system changes.

 

Required profile

Experience

Industry :
Human Resources, Staffing & Recruiting
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Training And Development
  • Collaboration

Related jobs