Platform integration: Integrate with enterprise tooling (ServiceNow, monitoring/observability, schedulers, CI/CD) as needed to run automations safely in production.
Security & compliance: Follow least-privilege access, secrets management, logging standards, and audit requirements for systems and data.
Testing & quality: Build unit/integration tests, perform non-prod validation, and define acceptance criteria aligned to operational readiness.
Observability: Add structured logs, metrics, and health checks; define alerts/dashboards to detect failures and measure throughput.
Documentation & handoff: Produce runbooks, support guides, and knowledge transfer for Tier 2/3 ownership.
Stakeholder partnership: Participate in intake grooming, technical design reviews, and demos; provide weekly status and risks/issues.
Expected Deliverables
Automation backlog & intake: A ranked inventory of automation candidates with effort/ROI estimates and success measures.
Reusable automation framework: Templates/libraries for common patterns (config, logging, retries, auth, scheduling, notifications).
Production-ready automations: Delivered in increments, each with code, tests, deployment artifacts, and operational documentation.
Operational readiness package: Runbook, monitoring/alerting, rollback plan, and ownership handoff checklist for each automation.
Metrics & reporting: A simple dashboard or report showing delivery progress and realized benefits (hours saved per month/year to date, number of times run, MTTR reduction, error rate, throughput). These metrics should be available per automation and for all automations.
Required Qualifications
5+ years building automations in enterprise environments (platform, operations, or reliability automation).
Strong development skills in at least one automation-friendly language (e.g., Python, Java, JavaScript/TypeScript, PowerShell, Bash, .NET, C++), and comfort with APIs.
Experience building resilient, production-grade workflows (error handling, idempotency, retries, auditing).
Experience with CI/CD and source control (e.g., Git), including branching and code review practices.
Hands-on experience with Linux/Unix and scripting; ability to troubleshoot across logs, jobs, and infrastructure.
Understanding of security fundamentals: least privilege, secrets management, non-person IDs/service accounts, and data handling.
Ability to work from ambiguous requirements and translate operational pain points into implementable technical solutions.
Experience documenting and training others on the automations created and how they are invoked-automatically when a condition occurs or manually to address an identified condition.
ServiceNow development/automation (Flow Designer, IntegrationHub, scripting) and ticket lifecycle automation.
Cloud-native automation (containers, Kubernetes, serverless functions) and infrastructure-as-code (e.g., Terraform).
Observability tooling experience (logs/metrics/traces) and building dashboards/alerts.
Experience in regulated environments (SOX/SOC), including evidence capture and audit-friendly change practices.
Familiarity with data reconciliation, financial operations, or payment/rebates domains.
Experience with Jira, Confluence and ServiceNow.
Engagement Model & Working Expectations
Cadence: Weekly planning/prioritization; bi-weekly demos; weekly written status (progress, blockers, risks, next steps).
Collaboration: Partner with Production Support Engineering for toil identification and acceptance; partner with SRE for standards (monitoring, reliability patterns).
Definition of Done: Automation is deployed, monitored, documented, and handed off with ownership and support instructions.
Change control: All production changes follow standard change processes, approvals, and evidence capture.
Success Metrics (Examples)
Reduction in manual hours/week for targeted processes (baseline vs. post-automation).
Decrease in recurring incident volume for automated failure patterns.
Improvement in MTTR/MTTA for automatable recovery scenarios.
Automation reliability: success rate, failure rate, and mean time to recovery of the automation itself.
Adoption: number of teams/processes onboarded to standardized patterns and reusable components.
Assumptions & Dependencies
Timely provisioning of required access (non-person IDs/service accounts) and test environments.
Availability of SMEs for process walkthroughs, acceptance testing, and operational handoff.
Agreement on target platforms/standards (logging, monitoring, deployment approach) to ensure consistency.