Offer summary

Qualifications:

Excellent knowledge and experience in observability platforms like Prometheus, InfluxDB, and others., Background in containers and familiarity with container orchestration services., Proficient coding skills, particularly in Java, and experience with big data platforms and open-source technologies in computing systems..

Key responsabilities:

Provide fundamental Site Reliability Engineering (SRE) services to clients.

Develop and establish sustainable SRE practices, measured environments, and operating models.

Implement tooling, automation, and error budget practices for effective stability and performance control.

Guide clients in evolving their operational frameworks towards advanced architectural and automated perspectives.

Job description

Job Description

Who we are:

Born digital, UST transforms lives through the power of technology. We walk alongside our clients and partners, embedding innovation and agility into everything they do. We help them create transformative experiences and human-centered solutions for a better world.

UST is a mission-driven group of over 38,000+ practical problem solvers and creative thinkers in over 30+ countries. Our entrepreneurial teams are empowered to innovate, act nimbly, and create a lasting and sustainable impact for our clients, their customers, and the communities in which we live.

With us, you’ll create a boundless impact that transforms your career—and the lives of people across the world.

Visit us at UST.com .

You Are

Someone who can explore client readiness to employ advanced technologies to reduce organizational silos (Collaborate), accept failure as normal (SLIs/SLOs), implement gradual change (Adaptive), leverage tooling and automation, and measure everything. Have experience working client’s multiple stakeholder levels and cross-functional teams to assess SRE readiness, determine production reliability metrics & maturity levels, service management reference architecture, and instil successful SRE practices encompassing team forming, roles, responsibilities, skill mix, mindset, processes, tools, and work cultural aspects.

The Opportunity

Provide clients with a portfolio of fundamental SRE services, principles, and implementation best practices, covering SRE end-to-end life cycle phases of culture & organization, design & architecture, build & deploy, test & verify, information & reporting.
Develop and deliver an enhanced client’s SRE working environment employing conceptual training and practical guidance based on phased evolution engineering approach to products development and operation, and prepare a roadmap of operating mode transfer from descriptive (basic Monitoring), to diagnostic (RCA), to predictive (Analytics), to perspective (AI Intelligent Automation) model.
Develop Client’s SRE operating model stages roadmap from observability & Telemetry, chaos engineering & automated impact assessment, self-healing & self-service.
Support SRE principles using automation to scale, load, balance operation toil and improvement development, create an error budget to control velocity balance effective self-regulation of features against stability, practice observability, use actionable automated runbooks, hold blameless post-mortem for every event, apply error budge difference between negotiated performance (that is, the SLO) and actual performance.

What You Need

Excellent knowledge and working experience in implementing one or more Observability platforms like Prometheus, InfluxDB, Dynatrace, Grafana, Splunk etc. to measure telemetry data like logs, metrics and traces.
Previous experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services (Kubernetes, Docker Swarm, AWS ECS, AWS EKS).
Experience with other public cloud platforms like Azure,AWS, and GCP.
Solid professional coding experience in at least one programming language, preferably Java.
Experience with BigData platforms, like AWS EMR, Databricks, Cloudera, Hortonworks etc.
Experience with open source technologies like Hadoop, Hive, Presto, Spark, Airflow etc.

Bonus Points If

Architecture Certified.
DevOps Certified.
ITIL Service Certified.
Agile Certified.
6 Sigma Quality Control Certified.
Experience in Monitoring and Metrics systems.
Background in Software Development and Operations.
Experience with scripting languages.

Compensation can differ depending on factors including but not limited to the specific office location, role, skill set, education, and level of experience. As required by local law, UST provides a reasonable range of compensation for roles that may be hired in California, Colorado, New York City, or Washington as set forth below.

Role Location: US, Remote

Compensation Range: $114,080-$133,920

What We Believe

We’re proud to embrace the same values that have shaped UST since the beginning. Since day one, we’ve been building enduring relationships and a culture of integrity. And today, it's those same values that are inspiring us to encourage innovation from everyone, to champion diversity and inclusion, and to place people at the center of everything we do.

Humility

We will listen, learn, be empathetic and help selflessly in our interactions with everyone.

Humanity

Through business, we will better the lives of those less fortunate than ourselves.

Integrity

We honor our commitments and act with responsibility in all our relationships. **

Equal Employment Opportunity Statement**

UST is an Equal Opportunity Employer.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

UST reserves the right to periodically redefine your roles and responsibilities based on the requirements of the organization and/or your performance.

Required profile