Social Discovery Group (SDG) is the 3rd largest social discovery company in the world, uniting 60+ brands with 500 million users. We solve the problems of loneliness, isolation, and disconnection by transforming virtual intimacy into the new normal. Our portfolio includes online communication platforms focusing on AI, game mechanics, and video streaming - Dating.com, DateMyAge, Cupid Media, Dil Mil, Kiseki, and others.
SDG invests in IT startups worldwide. Our investments include OpenAI, Patreon, Flo, Clubhouse, Woebot, Flure, Astry, Coursera, Academia.edu, and many others.
We bring together a team of like-minded people and IT professionals specialising in the creation and development of globally impactful social discovery products. Our international team of 1200 professionals and digital nomads works all over the world.
Our teams of digital nomads work remotely from Cyprus, Malta, the USA, Armenia, Georgia, Kazakhstan, Montenegro, Poland, Latvia, Serbia, Spain, Portugal, UAE, Israel, Turkey, Thailand, Indonesia, Japan, Hong Kong, Australia and many other locations.
In August 2024, we achieved Great Place to Work US Certification™! This achievement reflects our core belief that a truly exceptional workplace is built on trust, pride, and camaraderie—not just great perks.
We are looking for a Senior DevOps Engineer/ MLOps Engineer.
Your main tasks will be:
- Support and development of ML/LLM infrastructure in dev and prod;
- Deployment and maintenance of inference services for ML models;
- Building a fault-tolerant and scalable infrastructure for high-load environments;
- Configuring and maintaining CI/CD for ML and backend solutions;
- Working with GPU infrastructure: efficient resource utilisation, GPU isolation, and partitioning (A100/H100);
- Collaborating with the DS team and backend developers (.NET) to deploy services (including models) to production.
We expect from you:
- Linux
- Docker
- Kubernetes
- CI/CD (GitHub)
- IaC (Terraform / Ansible / Helm)
- Experience with GPU infrastructure and the CUDA / NVIDIA stack
- Understanding of how ML/LLM works
- Experience with GPU partitioning / MIG (A100/H100) is a major plus
- Monitoring and logging: Prometheus, Grafana, ELK / OpenSearch, or similar tools
- Experience with AWS
- Understanding of networking, fault tolerance, and scaling
- Experience with GPU partitioning / MIG (A100/H100) is a major plus
- Experience integrating with a .NET backend is a plus
- Working knowledge of Python is a plus
What do we offer:
- REMOTE OPPORTUNITY to work full-time;
- Vacation 28 calendar days per year;
- 7 wellness days per year (time off) that can be used to deal with household issues, to lie down and recover without taking sick leave;
- Bonuses up to $5000 for recommending successful applicants for positions in the company;
- 50% payment for professional training, international conferences and meetings;
- Corporate discount for English lessons;
- Health benefits. According to the paychecks, if you are not eligible for corporate medical insurance, the company will pay up to $1,000 gross per employee per year. This can be spent on self-purchase of health insurance or on doctors’ fees for yourself and close relatives (spouse, children);
- Workplace organisation. The company provides all employees with an equipped workspace and all necessary equipment (table, armchair, Wi-Fi, etc.) in our offices or co-working locations. At the other locations, the company provides reimbursement for workplace costs up to $1000 gross once every 3 years, according to the paychecks. This money can be spent on the rent of the co-working room, on equipping the working place at home (desk, chair, Internet, etc.) during those 3 years.
- Internal gamified gratitude system: receive bonuses from colleagues and exchange them for our merchandise, team building activities, massage certificates, etc.
Sounds good? Join us now!