Key Facts

Remote From:

Full time

Senior (5-10 years)

English

Hard Skills

Natural Language Processing (NLP) Large Language Modeling Machine Learning Software Engineering Text Processing Model Validation Application Programming Interface (API) Audio Signal Processing Pandas (Python Package) Named Entity Recognition Data Science AWS Cloud Services SQL (Programming Language) Topic Modeling Unsupervised Learning Convolutional Neural Networks Hugging Face (NLP Framework) Software Versioning Unstructured Data Deep Learning Machine Learning Machine Learning Feature Engineering Data Cleansing Sentiment Analysis Google Cloud Platform (GCP) Recurrent Neural Network (RNN) Intelligent Agent Microsoft Azure Cloud Computing Python (Programming Language) Digital Image Processing Scikit-Learn (Python Package) Reinforcement Learning PyTorch (Machine Learning Library) Data Preprocessing Java Enterprise Edition GPT-3 (NLP Model) Multi-Agent Systems TensorFlow Multidimensional Scaling BERT (NLP Model) Supervised Learning Business Process Automation Text Manipulation Advanced Analytics Clinical Informatics

Other Skills

•
Decision Making
•
Communication
•
Teamwork
•
Critical Thinking
•
Business Acumen
•
Problem Solving
•
Creativity

Roles & Responsibilities

Bachelor's degree in Computer Science, Data Science, or a highly quantitative field; Master's degree preferred in ML, computational statistics, operations research, or related quantitative discipline.
7+ years of professional experience in data science with a track record of taking AI applications from concept to production.
Advanced ML/DL expertise including supervised/unsupervised learning, CNNs/Transformers, reinforcement learning, and building agentic workflows with RAG integration and LLM orchestration.
Expertise in data handling with unstructured data; strong SQL skills and experience with cloud platforms (AWS, GCP, Azure); Large Language Model proficiency.

Requirements:

Build, train, validate, and deploy ML and deep learning models and AI agents for tasks involving unstructured and structured data, with a focus on workflow automation.
Perform NLP for information extraction from unstructured text, including tokenization, sentiment analysis, named entity recognition, topic modeling, and leveraging pre-trained models from BERT, GPT, or Hugging Face.
Design AI agent architectures comprising an LLM brain, task-specific tools, and decision-making logic; orchestrate RAG workflows and integrate LLMs with other systems.
Develop robust data pipelines and MLOps practices, including data cleaning, feature engineering, model versioning, monitoring, and deployment on cloud platforms; build connectors/APIs to automate business processes.

Genentech

Biotech: Biology + Technology

About Genentech

About Genentech We're passionate about finding solutions for people facing the world's most difficult-to-treat conditions. That is why we use cutting-edge science to create and deliver innovative medicines around the globe. To us, science is personal. Making a difference in the lives of millions starts when you make a change in yours. If you’d like to join our team, view our openings at gene.com/careers. Our patient resource center is dedicated to getting patients and caregivers to the right resources. You can reach them at 1 (877) GENENTECH (436-3683) Monday-Friday, 6am-5pm PST or patientinfo@gene.com. Community Guidelines: 1. We want to foster positive conversation and diverse community around the issues we are passionate about. To that end, we remove profanity, content that contains credible threats or hate speech, content that is aimed at private individuals, personal information meant to harass someone, and repeated unwanted messages. 2. Don’t mention any medicines by name — ours or anyone else’s. Because of the fair balance rules governing our industry, we cannot post any comments that reference any pharmaceutical brand, product, or service. Please do not mention any specific medicines by name, or include any links to third party sites in your comments. 3. This isn’t the place to report or discuss side effects. This site is not intended as a forum for reporting side effects experienced while taking a Genentech product. Instead, you should report any side effects to Genentech Drug Safety at 1-888-835-2555. You can also report side effects of any prescription product directly to the FDA at 1-800-FDA-1088 or by visiting www.FDA.gov/medwatch. 4. Don’t pitch your product or service. Please don't use our page as a place to promote your product or pitch your services. Please also avoid posting links to external sites. We reserve the right to remove any posts that are deemed promotional.

Company type: XLarge

Industry: Biotech: Biology + Technology

Founded: 2018

Company size: 10001

Website LinkedIn See all jobs →

Job description

The Opportunity

As a Data Scientist you will have a strong foundation in machine learning (ML), data science, and software engineering. You will have practical experience in building and deploying ML models and developing AI agents, particularly for tasks involving unstructured/structured data and workflow automation.

Key Responsibilities:

Machine Learning and Deep Learning: The candidate must be proficient in a wide range of ML algorithms, from traditional models like linear regression and decision trees to more advanced deep learning architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). They should understand the principles behind model training, validation, and hyperparameter tuning.
Natural Language Processing (NLP): For extracting information from unstructured text, strong NLP skills are essential. Look for experience with techniques like tokenization, sentiment analysis, named entity recognition, topic modeling, and using pre-trained language models like BERT, GPT, or others from the Hugging Face ecosystem.
Data Handling and Feature Engineering: They should be adept at working with various data formats and have experience in data cleaning, preprocessing, and transforming raw data into useful features for ML models. This includes handling missing values, encoding categorical data, and scaling numerical features.
Programming and MLOps: Proficiency in Python is a must, along with a solid understanding of key libraries like Scikit-learn, Pandas, TensorFlow, and PyTorch. Experience with MLOps (Machine Learning Operations) practices, including model versioning, monitoring, and deployment on cloud platforms (AWS, Azure, or GCP), is crucial for building and maintaining robust solutions.
AI Agent Architectures: Look for a candidate who understands the components of an AI agent, including a Large Language Model (LLM) as the brain, tools for specific tasks, and a logical structure for decision-making.
Workflow Automation: The candidate should have practical experience in designing and implementing automated workflows. This involves integrating AI agents and ML models into existing business processes. They should be able to identify bottlenecks, map out a solution, and build the necessary connectors or APIs to execute tasks automatically.
Unstructured Data: The candidate needs to demonstrate expertise in handling various forms of unstructured data, including text, images, and audio. This involves building pipelines to ingest, process, and analyze this data to extract meaningful insights or trigger actions.

Who you are

Problem-Solving: The ability to break down complex business problems into manageable, data-driven solutions is key. They should be able to think critically and creatively to solve real-world challenges.
Communication: A great candidate can clearly articulate technical concepts to non-technical stakeholders, explaining the "why" and "how" of their solutions. This is vital for collaborating with different teams and ensuring the project meets business goals.
Business Acumen: The best candidates understand the business context of their work. They should be able to connect their technical solutions directly to a positive impact on the company's bottom line or operational efficiency.

Education & Academic Background

Minimum Requirement: A Bachelor’s degree in a highly quantitative field (Computer Science, Data Science or related field).
Preferred: A Master’s in a specialized domain such as Machine Learning, Computational Statistics, Operations Research, or a related quantitative discipline.

Proven Track Record: At least 7 years of professional experience in data science, with a clear history of taking AI applications from conceptualization to production environments.
Data Handling: Expertise in handling unstructured data
Advanced ML Expertise: Experience with supervised/unsupervised learning, deep learning (CNNs, Transformers), and reinforcement learning; proficiency in building agentic workflows, including RAG integration and LLM orchestration
Data Infrastructure: Expertise in SQL and experience working with cloud platforms (AWS, GCP, or Azure)
Large Language Model expertise required
Experience with Diagnostics and/or Pharmaceutical data is a plus

Pleasanton location (where the team resides) is highly preferred. The position can be remote for exceptional candidates.

Relocation benefits are not available for this posting

The expected salary range for this position based on the primary location of California is $127,200 - $236,200.00. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. A discretionary annual bonus may be available based on individual and Company performance. This position also qualifies for the benefits detailed at the link provided below.

Benefits

#LI-PK1

Genentech is an equal opportunity employer. It is our policy and practice to employ, promote, and otherwise treat any and all employees and applicants on the basis of merit, qualifications, and competence. The company's policy prohibits unlawful discrimination, including but not limited to, discrimination on the basis of Protected Veteran status, individuals with disabilities status, and consistent with all federal, state, or local laws.

If you have a disability and need an accommodation in relation to the online application process, please contact us by completing this form Accommodations for Applicants.