Logo for Wide and Wise Talent Partner

Vision-Language Model (VLM) Engineer

Roles & Responsibilities

  • Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, or a related field
  • Strong experience with Python and deep learning frameworks (e.g., PyTorch or TensorFlow)
  • Solid understanding of machine learning, computer vision, and NLP concepts
  • Experience with multimodal models or related architectures (e.g., transformers)

Requirements:

  • Design and implement vision-language models for tasks such as image captioning, visual question answering, and cross-modal retrieval
  • Train, fine-tune, and evaluate multimodal models using large-scale datasets
  • Optimize model performance for scalability and real-world deployment
  • Collaborate with cross-functional teams including data scientists, software engineers, and product managers

Job description

We are seeking a highly skilled Vision-Language Model (VLM) Engineer to design, develop, and deploy state-of-the-art multimodal AI systems. You will work at the intersection of computer vision and natural language processing, contributing to cutting-edge products that combine image and text understanding.

Key Responsibilities:

Design and implement vision-language models for tasks such as image captioning, visual question answering, and cross-modal retrieval

Train, fine-tune, and evaluate multimodal models using large-scale datasets

Optimize model performance for scalability and real-world deployment

Collaborate with cross-functional teams including data scientists, software engineers, and product managers

Stay up to date with the latest research in multimodal AI and apply it to production systems

Required Qualifications:

Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, or a related field

Strong experience with Python and deep learning frameworks (e.g., PyTorch or TensorFlow)

Solid understanding of machine learning, computer vision, and NLP concepts

Experience with multimodal models or related architectures (e.g., transformers)

Familiarity with handling large datasets and distributed training

Preferred Qualifications:

Experience with models such as CLIP, BLIP, or similar multimodal architectures

Knowledge of model deployment (Docker, APIs, cloud services)

Publications or contributions to AI research projects

Experience working with real-world AI applications

Related jobs

Other jobs at Wide and Wise Talent Partner

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.