Match score not available

Cloud Machine Learning Operations (MLOps) Engineer

extra holidays - extra parental leave
Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 
Germany, Maryland (USA), United States

Offer summary

Qualifications:

TS/SCI W/ POLY REQUIRED, Bachelor's or Master's in Computer Science or related field, 3+ years of experience as a Machine Learning Operations Engineer focused on NVIDIA Triton, Strong programming skills in Python, Familiarity with TensorFlow or PyTorch, GPU optimization for deep learning.

Key responsabilities:

  • Build an enterprise-scale environment using AWS technologies and DevOps methodologies
  • Collaborate to deploy machine learning models ensuring optimal performance and scalability
  • Implement monitoring, automation, and security best practices for ML operations
  • Develop troubleshooting processes and technical support for model deployment issues
  • Maintain documentation of NLOps processes and workflows
Applied Insight, LLC logo
Applied Insight, LLC SME https://www.applied-insight.com/
501 - 1000 Employees
See more Applied Insight, LLC offers

Job description

About Us: Innovating to solve real-world problems

Applied Insight enhances the ability of federal government customers to preserve national security, deliver justice and serve the public with advanced technologies and quality analysis. We work closely with agencies and industry to overcome technical and cultural hurdles to innovation, empowering them with the latest end-to-end cloud infrastructure, big data and cyber capabilities. Our expertise in cross-domain and boundary solutions, network analytics, DevOps and low-to-high development is unique in our industry. We develop and deliver innovative products and applications that are deployed in highly sensitive customer environments and have broad applications for federal missions.

On joining the Applied Insight team, you’ll be working to solve real-world problems on missions that matter with people who share your passions and encourage your ambition. It’s vital to us that we hire committed people who are great at what they do. We return that commitment by empowering them with the autonomy, the support and the tools they need to fulfill their true potential.

A day in the life (just a few of the things you may do on any given day):

Enhance your current skillset by disrupting traditional workflows and processes building an enterprise-scale environment using AWS technologies coupled with DevOps methodologies. You will be an integral part of a team of knowledgeable technologists responsible for helping to build an enterprise-scale cloud presence within the IC for software development, web hosting, research, and more! This is a multi-faceted position requiring you to spend time working directly with AWS services and the underlying operating systems themselves, to efficiently improve security automations, aid collaboration efforts with software engineers, and streamline infrastructure processes. This position offers the opportunity to use your existing infrastructure, IT, or systems engineering experience and apply it to solve problems with tools and concepts unique to AWS and Cloud Service Provider environments.

This role is in support of dynamic, rotating professional services engagements lasting from 6 weeks to 6 months. In this role, you will continuously be focused on forward leaning migration, evolution, and optimization tasking within AWS-based environments (no long-term operations and maintenance tasking!). You will be regularly tasked with new customers and exciting new challenges to help advance customer adoption of AWS across multiple classification domains.

  • NVIDIA Triton Inference Server Expertise: Leverage your in-depth knowledge of NVIDIA Triton to design and manage scalable and high-performance inference pipelines in a production, enterprise system.
  • Model Deployment: Collaborate with data scientists and software engineers to deploy machine learning models, ensuring optimal performance, resource utilization, and cost tracking and savings
  • Scalability: Architect and implement solutions to scale machine learning inference to handle large workloads efficiently.
  • Performance Optimization: Monitor and fine-tune model inference for optimal speed and resource utilization.
  • Automation: Implement automation tools and processes for model deployment, monitoring, and scaling.
  • Monitoring and Logging: Develop robust monitoring and logging solutions to track model performance, system health, and data quality in real-time.
  • Security: Help implement security best practices to protect machine learning models and data.
  • Documentation: Maintain detailed documentation of machine learning operations processes and best practices.
  • Collaboration: Work closely with a cross-functional Product team to understand business requirements and translate them into technical solutions.
  • Troubleshooting: Provide technical support for debugging and resolving issues related to model deployment and inference.

You will excel in this role if you are:

  • Embracing Emerging Technology: You will leverage AWS and its accompanying tools daily as you help build and stand up a game-changing development environment.
  • Well-rounded: You appreciate the opportunity to work across multiple technologies such as scripting, development/test/QA tools, cloud, container, and orchestration tools, Linux and Windows operating systems, networking, security and automation.
  • Motivated: You want to continually learn new things and work with new technologies.
  • Agile: Able to work as part of small team working together to develop a solution for both commercial and government customers.
  • Focused on Automation: Wherever possible, you look for ways to automate manual processes to increase efficiency, speed, and operability of tasks.

What we are expecting from you (i.e. the qualifications you must have):

  • TS/SCI W/ POLY NEEDED
  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • Proven experience (3+ years) as a Machine Learning Operations Engineer with a focus on NVIDIA Triton.
  • Experience with other MLOps tools and platforms
  • Strong programming skills in Python.
  • Familiarity with machine learning frameworks like TensorFlow or PyTorch.
  • Experience with GPU hardware and optimization for deep learning workloads.
  • Strong problem-solving skills and the ability to work effectively in a collaborative team environment.
  • Excellent communication skills and the ability to convey technical concepts to both technical and non-technical stakeholders.
  • Solutions Architect Associate credential or other Associate (In Progress acceptable)

What we are desiring from you (i.e. the nice-to-have qualifications):

  • Proficiency in containerization technologies and orchestration tools (e.g., Docker, AWS Fargate, Amazon Elastic Container Service, AWS Elastic Kubernetes Service).
  • Knowledge of DevOps practices and continuous integration/continuous deployment (CI/CD) pipelines
  • Familiarity with the AWS cloud platform
  • Previous experience in the deployment of machine learning models in production environments.

What we will provide in return: Excellent compensation and amazing benefits

  • Multiple health insurance options
  • 401k Immediate Vesting. Company matches 100% of the first 3% contributed and 50% of the next 2% contributed.
  • Fully paid long-term disability, short-term disability, and life insurance.
  • Flexible Spending Account options.
  • Generous paid time off.
  • Flexible work schedules with the ability to bank extra hours for additional time off.
  • Government shutdown protection where employees don't have to use leave for up to 3 days out of the year for inclement weather or budget issues.
  • Employee centric culture and a belief that we should empower those who are good at what they do and then give them the tools they need to achieve success and grow their career.
  • A commitment to learning and growth and easy ways to achieve both including a training budget, education assistance, mentorship programs and collaborative learning sessions.
  • A collaborative environment that fosters communication and an open-door policy.

https://www.applied-insight.com/careers/open-positions/EEO/AA including Vets and Disabled.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Collaboration
  • Communication

Machine Learning Engineer Related jobs