Job Title: Production Support Specialist
Location: Remote
The Production Support Specialist plays a crucial role in ensuring the stability and availability of mission-critical production systems. This position is responsible for ongoing monitoring, incident identification and response, and issue resolution to maintain optimal service delivery. The specialist will lead SWAT calls during high-priority incidents, manage the incident ticketing process, and collaborate with cross-functional teams to restore service levels efficiently.
This role also includes managing infrastructure upgrades, overseeing SSL certificate management, and ensuring system readiness for peak operational periods. Additionally, the Production Support Specialist will assist with disaster recovery activities, managing PHI-related issues per compliance standards, and providing key stability and availability metrics to stakeholders.
Monitor and manage the health and stability of production systems.
Lead SWAT Calls to address critical incidents and minimize downtime.
Initiate and drive TOC Calls (Technical Operations Calls) during outages, ensuring rapid issue resolution for Priority 1, 2, & 3 incidents.
Oversee and manage incident ticket workflows, ensuring proper tracking and follow-up for Priority 4 & 5 incidents within established SLAs.
Provide support and validation during Infrastructure Upgrades & Maintenance to minimize service disruption.
Manage SSL Certificate renewals and configuration to maintain secure communications.
Ensure systems and processes are operationally ready, especially during peak periods (e.g., year-end high-traffic times).
Act as a liaison between Helpdesk, Development, Business, and Account Management teams.
Coordinate cross-functional teams to resolve issues and implement process improvements.
Assist in the creation and maintenance of Application Recovery Guides (ARGs).
Support disaster recovery (DR) activities to ensure business continuity.
Manage and validate PHI-related issues, ensuring regulatory compliance.
Provide key stability and availability metrics to track system uptime and performance.
Production Support experience (on-prem and cloud-based).
Hands-on experience with Jira, ServiceNow (SNOW), API (SOA), and cloud tools.
Expertise in Splunk (or similar monitoring tools), DataPower, Apigee, and Microservices.
Strong knowledge of OAuth, SSL, HTTP security protocols.
Experience with Docker and Redis for optimization and caching.
Proficiency in WebSphere Application Server and MQ.
Experience with Wily, Tivoli, Client BSM, and Java.
Familiarity with Glider assessment tools.
Advocates
RippedBoxStation
Peraton
Innovecs
HMH