Senior MLOps Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field., Proven experience in MLOps practices and tools., Strong programming skills in Python and familiarity with machine learning frameworks., Experience with cloud platforms and CI/CD pipelines..

Key responsibilities:

  • Design and implement MLOps pipelines for model deployment and monitoring.
  • Collaborate with data scientists to optimize machine learning models.
  • Ensure the reliability and scalability of machine learning systems.
  • Provide technical support and guidance to team members on MLOps best practices.

Mentalyc Inc. logo
Mentalyc Inc.

Job description

Join a team revolutionizing mental healthcare. Mentalyc, is an industry leader and pioneer of AI-note taking for therapists and other mental health professionals. Since the company incorporation in 2021, we have automated note taking, treatment planing, progress tracking with AI as a central component. We have also acquired thousands of customers and grew our team. Our vision is to make therapy more effective, efficient, and truly measurable through insightful, data-driven interventions. Our mission is t o turn AI-generated notes into Clinical Intelligence that helps therapists grow, deepen therapeutic relationships, and achieve outcomes AI alone never could. We believe in elevating mental health care, one note at a time. As a team, we are driven by curiosity, care, and collaboration. We push boundaries, embrace new ideas, and trust in each other and the process. We strive to inspire, innovate, and explore, all while ensuring data privacy and supporting therapists so they can focus on what matters most—delivering quality care. What We Offer: Innovative and Mission-Driven Environment: Be part of a fast-growing, high-performing company dedicated to transforming mental health technology with impactful AI solutions. Culture of Excellence: Collaborate with driven and talented individuals who share a commitment to innovation, continuous improvement, and achieving outstanding results. Scaling up our Impact Worldwide: Lead the efforts in building scalable machine learning solutions to maximize the impact of our solutions to clinicians and patients across the world. High-Impact Role: Drive key technical initiatives, enhance platform scalability and quality, and play a critical role in achieving the company’s strategic objectives. Flexible and International Team: Join a global team that values excellence and adaptability, offering the benefits of fully remote work and flexible hours to meet the demands of a high-growth environment. Responsibilities Deploy, Optimize, and Monitor ML Infrastructure: Lead efforts to ensure models are efficiently deployed for GPU inference, including parallelization and low-level optimization strategies. Establish logging, monitoring, and alerting mechanisms to guarantee 24/7 system reliability. Performance and Turnaround Time Improvements: Identify bottlenecks in our processing (speech-to-text, structure note creation, downstream features) to reduce turnaround times. Dynamic Scaling and Cost Optimization: Implement Kubernetes-based solutions (Helm, Keda, Kubeflow) for auto-scaling to handle fluctuating workloads. Fine-tune resource allocation, particularly GPU resources, to balance high performance with cost-effectiveness. CI/CD and Model Lifecycle Management: Build and maintain automated CI/CD pipelines for model training, testing, and finetuning in collaboration with ML Engineers and Clinicians. Drive best practices in model versioning, QA, and end-to-end deployment processes. Infrastructure as Code and Cloud Management: Use Terraform to provision and manage AWS infrastructure. Streamline deployment pipelines to ensure reliable releases and updates in a high-availability environment. Collaboration on Model Iterations: Work closely with ML Engineers and Clinicians to refine model performance for session note generation and advanced analytics (progress assessment, treatment recommendations, etc.). Ensure models are production-ready and seamlessly integrate into a scalable cluster. Requirements Professional Experience: 5+ years in MLOps, ML Engineering with DevOps or similar roles, with a focus on deploying and managing ML workflows at scale. Demonstrated success handling GPU-based inference for real-time or batch processing. Technical Proficiency: Solid experience with Kubernetes (Helm, Keda, Kubeflow) and Terraform for infrastructure provisioning. Proficient in Python, with C++ a plus for performance optimizations. Skilled in PyTorch for model fine tuning and serving. Cloud and Observability Skills: Strong knowledge of AWS (EC2, S3, EKS, etc.) and experience configuring monitoring tools (Prometheus, Grafana, CloudWatch) to ensure observability, uptime, and scalability. Scalability Focus: Proven track record in designing and implementing solutions for high-throughput, GPU-centric ML workflows. Ability to optimize resource usage and cost, particularly under dynamic workload demands. Open-Ended Scalability Problem-Solving: Demonstrated ability to tackle complex, ambiguous scalability challenges with innovative solutions. Skilled at identifying bottlenecks—from GPU infrastructure to data pipelines—and driving robust, cost-effective optimizations in production environments. Personal Qualities Problem-Solver: You tackle complex challenges—from queue backlogs to GPU optimization—with creativity and determination. Proactive and Results-Driven: You take initiative to enhance reliability and performance, consistently seeking ways to reduce costs and streamline processes. Adaptable: You thrive in fast-paced settings, capable of pivoting quickly in response to new requirements or technologies. Collaborative Leader: You mentor team members, encourage knowledge sharing, and foster an environment of mutual support. Curiosity and Empathy: You’re eager to learn—from new MLOps tools to the workflows of clinicians—ensuring solutions meet real-world needs. Nice to Have C++ Optimizations: Proficiency in C++ for high-performance model or library optimizations. Security and Compliance: Familiarity with HIPAA, SOC2, or other healthcare data protection standards. Experience with Additional MLOps Tools: Familiarity with other orchestration frameworks like Kubeflow shows broader industry insight. Experience with Additional MLOps Tools and Innovations: Familiarity with other orchestration frameworks (e.g., Kubeflow) and a commitment to staying current with emerging MLOps trends and best practices. JavaScript, GraphQL, and SQL Experience: Additional knowledge of frontend/back-end and database technologies can enhance cross-team collaboration and end-to-end solution delivery. Application: To apply, kindly follow the link: https://www.mentalyc.com/careers/senior-mlops

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Field Engineer (Solutions) Related jobs