Role: Platform Engineer / DevOps Engineer
Position Type: Full-Time Contract (40hrs/week)
Contract Duration: 12 Months
Work Hours: EST/CST
Location: 100% Candidates (Candidates can work from anywhere in LATAM Countries)
WHAT YOU'D BE DOING:
• Operate and improve platform tools so product teams can ship reliably, triaging tickets, fixing build issues, and handling routine service requests (access, secrets, environment setup).
• Maintain and extend self-service workflows (templates, golden paths) by updating docs, examples, and guardrails under guidance from senior engineers.
• Perform day-to-day Kubernetes operations: deploy/update Helm charts, manage namespaces, diagnose rollout issues, and follow runbooks for incident response.
• Support CI/CD pipelines (e.g., GitLab CI): keep pipelines green, add/adjust jobs, implement basic quality gates, and help teams adopt safer deploy strategies (blue/green, canary).
• Monitor and operate the observability stack using Prometheus, Alert Manager, and Thanos; maintain alert rules, dashboards, and SLO/SLA indicators; help reduce alert noise and improve signal quality.
• Assist with service instrumentation across the core observability pillars—tracing, logging, and metrics—with hands-on OpenTelemetry usage (collectors/SDKs) and related telemetry tooling.
• Contribute to and improve documentation: runbooks, FAQs, onboarding guides, and standard operating procedures.
• Participate in an on-call rotation as needed with a well-defined escalation path; assist during incidents, post small fixes, and capture learnings in docs.
• Help with cost- and performance-minded housekeeping: right-size workloads, prune unused resources, and automate routine tasks where appropriate.
WE'RE LOOKING FOR SOMEONE WHO HAS:
• 6+ years in a platform/SRE/DevOps or infrastructure role, with a strong bias toward automation and support.
• Experience operating Kubernetes (or similar) and core ecosystem tools (Helm, Docker, Ingress NGINX, Argo Rollouts basics).
• Hands-on CI/CD experience (preferably GitLab CI): writing/modifying jobs, artifacts, environments, and basic deployment strategies.
• Scripting ability in Bash or Python (Go a plus) to automate repetitive tasks and improve runbooks.
• Familiarity with AWS fundamentals (e.g., IAM, EC2/EKS, S3, CloudWatch/CloudTrail, Parameter Store/Secrets Manager).
• Practical understanding of monitoring/observability (dashboards, logs, alerts) and how to use them for triage and remediation, including Prometheus/Alertmanager/Thanos and OpenTelemetry basics.
• Comfortable working from tickets (Jira/ServiceNow), following change-management practices, and communicating clearly with stakeholders.
HIGHLY PREFERRED CANDIDATES ALSO HAVE:
• Terraform experience for infrastructure as code (managing AWS, Kubernetes add-ons, and platform components).
• API integration experience (Java, Python, or Go) to build small internal tools or glue code.
• Deeper Linux fundamentals and container runtime basics for effective debugging and performance tuning.
• Exposure to insurance/financial services environments, including awareness of compliance and operational controls.
MUST HAVE:
• At least mid-level (not too junior).
• AWS Cloud Experience
– Hands-on AWS experience is required
– Must clearly appear on the resume
• Kubernetes (Critical)
– Strong Kubernetes experience is mandatory
– Must have:
▪ Operating and maintaining clusters
▪ Some experience building Kubernetes environments
– Manager explicitly stated:
"If you don't know how to build, you can't operate.”
• Infrastructure as Code (Terraform)
– Terraform experience explicitly mentioned as important
– Expected to understand IaC concepts
• Python (Hard Requirement)
– Used for:
▪ Automation
▪ Scripting
▪ Internal tooling
– Skill level: mid-level, not advanced application development
• CI/CD Pipeline Ownership
– Hands-on experience with building, maintaining, and enhancing CI/CD pipelines
– Tools:
▪ GitLab CI (preferred)
▪ Jenkins / GitHub Actions acceptable
– Focus on practical ownership, not tool brand
• Platform / DevOps Engineering Background
– Must come from:
▪ Platform Engineering or
▪ Strong DevOps / Infrastructure background
– Hands-on work with infrastructure, pipelines, and cloud operations
– Not a consulting or advisory role
• Observability (Conceptual Strength Required)
– Must understand observability principles and architecture
– Tools interchangeable (Prometheus, Thanos, Grafana, Datadog, New Relic)
– Focus on concepts + implementation
• Linux Fundamentals
– Solid working knowledge required
• Mid-Size Company Experience
– Preference for candidates from modern, mid-size engineering organizations
NICE TO HAVE:
• Security exposure in CI/CD pipelines and infrastructure workflows
• Multi-cloud exposure (Azure or GCP acceptable if AWS exists)
• Go or additional scripting languages (Python still mandatory)
• Experience supporting large engineering groups (~100–150 engineers)