About Team:
Site Reliability and Engineering group focuses on producing mission-critical infrastructure, tools, and processes that will ensure highest levels of availability and reliability of all our websites.
SRE’s drives standardization and service focused instrumentation, provides subject matter expertise, resolves break/fix scenarios, engaging broader teams as necessary; and partners/leads to achieve continuous improvement. In addition SRE’s contributes to command-and-control related activities focused on restoration of complex outages, and rapid restoration.
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart’s e-commerce/Retail and Enterprise platform. Our goal is to build, scale and guard the systems that delights the customers.
What you will do:
1. On Call responsibilities to help minimize MTTD and MTTR
2. Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere)
3. Should have skills to understand debugging info , “Drain” traffic away from a cluster, Rollback a bad software push , block or rate limiting unwanted traffic, bring up additional serving capacity thru autoscaling features and use the monitoring systems(for alerting and dashboards)
4. Engage with enterprise and business/infrastructure functions to establish, track, and optimize operational metrics and targets in line with SRE principles (SLO/SLI, Latency percentiles , error budgets, tech debt and setup alert guidelines )
5. Work with Observability tools and enterprise monitoring solutions like Dynatrace, AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk . Should be able to write promQLs and Splunk queries .
6. Programming/Tooling and Automation experience in one or more of the following languages: Golang, Java, Python, Typescript, Node and Shell .
7. Good understanding of Kafka internals , SQL/noSQL databases like Cassandra , Elasticsearch and Postgress and In-Memory Caching frameworks like Memcached .
8. Influence, design and create new architectures, standards, and methods for large-scale enterprise systems.
9. Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products.
• Engender reliability and availability starting with metrics and measurements.
• Enable scaling by providing tools, developing training and/or augmenting processes.
• Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
10. Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
11. Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
12. Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products
13. Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
14. Secure the system from issues, be they real, perceived, or notional.
What you will bring:
1. Bachelor's Degree or Master’s Degree with 3+ years of experience in Computer Science or related field.
2. Proficient in any of the programming languages like Java, GoLang, etc
3. Experience with IaaS and PaaS providers such as AWS, AZURE OpenStack, GCP
4. Experience with containerisation and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere). Experience with enterprise monitoring solutions like AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk
5. Familiarity with continuous integration/deployment processes and tools such as Jenkins, Maven, Nexus, etc
6. Understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)
7. Experience in agile methodology.
8. Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. Experience administering Linux systems in a production environment.
9. Experience with distributed version control like Git or similar
About Walmart Global Tech
Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That’s what we do at Walmart Global Tech. We’re a team of software engineers, data scientists, cybersecurity expert's and service professionals within the world’s leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail.
Flexible, hybrid work:
We use a hybrid way of working that is primarily in office coupled with virtual when not onsite. Our campuses serve as a hub to enhance collaboration, bring us together for purpose and deliver on business needs. This approach helps us make quicker decisions, remove location barriers across our global team and be more flexible in our personal lives.
Benefits:
Benefits Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include a host of best-in-class benefits maternity and parental leave, PTO, health benefits, and much more.
Equal Opportunity Employer:
Walmart, Inc. is an Equal Opportunity Employer – By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions – while being inclusive of all people.
The above information has been designed to indicate the general nature and level of work performed in the role. It is not designed to contain or be interpreted as a comprehensive inventory of all responsibilities and qualifications required of employees assigned to this job. The full Job Description can be made available as part of the hiring process.