Logo for Pavago

Data Engineer

Key Facts

Remote From: 
Full time
Mid-level (2-5 years)
English

Other Skills

  • Communication
  • Time Management
  • Analytical Thinking
  • Detail Oriented
  • Problem Solving

Roles & Responsibilities

  • 3+ years of experience in Data Engineering, Backend Engineering, or Data Infrastructure roles
  • Strong proficiency in Python and SQL
  • Experience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)
  • Hands-on experience with orchestration tools such as Airflow or Prefect

Requirements:

  • Build, maintain, and optimize ETL/ELT pipelines using Python, SQL, or Scala, and orchestrate workflows with Airflow, Prefect, Dagster, or similar tools
  • Design and optimize cloud data warehouses (Snowflake, BigQuery, Redshift) with scalable schemas and performance tuning
  • Implement data quality and governance measures including validation checks, lineage tracking, and documentation (dbt, Great Expectations), ensuring compliance with GDPR/HIPAA as applicable
  • Develop and manage real-time streaming pipelines (Kafka, Kinesis, Pub/Sub) for low-latency data and event-driven architectures

Job description

Job Title: Data Engineer

Position Type: Full-Time, Remote
Working Hours: U.S. client business hours (with flexibility for pipeline monitoring, deployments, and data refresh cycles)

About the Role

Our client is seeking a Data Engineer to design, build, and maintain scalable data infrastructure and reliable data pipelines that power analytics, reporting, and operational decision-making across the business.

This role requires strong software engineering fundamentals, deep experience with modern data stacks, and a passion for building clean, reliable, and high-performance data systems. The Data Engineer will ensure data flows seamlessly from source systems into warehouses, dashboards, and downstream applications while maintaining high standards for quality, governance, and scalability.

The ideal candidate is analytical, detail-oriented, and comfortable working across engineering, analytics, and business teams to deliver trustworthy and actionable data.

Responsibilities

Pipeline Development & Data Integration

• Build, maintain, and optimize ETL/ELT pipelines using Python, SQL, or Scala
• Orchestrate workflows using Airflow, Prefect, Dagster, or similar orchestration tools
• Ingest structured and unstructured data from APIs, SaaS platforms, databases, files, and streaming systems
• Develop scalable connectors and automated ingestion workflows

Data Warehousing & Modeling

• Manage and optimize cloud data warehouses such as Snowflake, BigQuery, or Redshift
• Design scalable schemas using star and snowflake modeling techniques
• Implement partitioning, clustering, indexing, and performance optimization strategies
• Build clean, analytics-ready datasets for business intelligence and reporting use cases

Data Quality, Governance & Reliability

• Implement validation checks, anomaly detection, logging, and monitoring to ensure data integrity
• Enforce naming conventions, lineage tracking, and documentation standards using tools such as dbt or Great Expectations
• Maintain audit-ready data processes and ensure compliance with GDPR, HIPAA, or industry-specific requirements
• Monitor pipeline health and proactively resolve failures or inconsistencies

Streaming & Real-Time Data Processing

• Build and manage real-time data pipelines using Kafka, Kinesis, Pub/Sub, or similar platforms
• Support low-latency ingestion and event-driven architectures for time-sensitive applications
• Monitor streaming infrastructure and optimize throughput and reliability

Collaboration & Analytics Enablement

• Partner closely with analysts, data scientists, and business stakeholders to deliver reliable datasets
• Support dashboard and reporting initiatives across Tableau, Looker, or Power BI
• Translate business requirements into scalable data solutions and models
• Maintain clear technical documentation for pipelines, schemas, and workflows

Infrastructure, DevOps & Automation

• Containerize data services using Docker and manage deployments through Kubernetes when applicable
• Automate deployments using CI/CD pipelines such as GitHub Actions, Jenkins, or GitLab CI
• Manage cloud infrastructure using Terraform, CloudFormation, or similar Infrastructure-as-Code tools
• Continuously optimize performance, scalability, reliability, and cloud costs

What Makes You a Perfect Fit

• Passionate about building clean, reliable, and scalable data systems
• Strong debugging and problem-solving mindset with high attention to detail
• Balance of software engineering discipline and analytical thinking
• Comfortable working cross-functionally with technical and non-technical stakeholders
• Proactive communicator who takes ownership of data quality and reliability

Required Experience & Skills

• 3+ years of experience in Data Engineering, Back-End Engineering, or Data Infrastructure roles
• Strong proficiency in Python and SQL
• Experience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)
• Hands-on experience with orchestration tools such as Airflow or Prefect
• Strong understanding of ETL/ELT pipelines, data modeling, and data transformation workflows
• Familiarity with cloud platforms such as AWS, GCP, or Azure

Preferred Experience & Skills

• Experience with dbt for data modeling and transformation management
• Streaming and event-driven data pipeline experience (Kafka, Kinesis, Pub/Sub)
• Experience with cloud-native data services such as AWS Glue, GCP Dataflow, or Azure Data Factory
• Familiarity with Docker, Kubernetes, Terraform, or CI/CD workflows
• Background in regulated industries such as healthcare, fintech, or enterprise SaaS
• Experience optimizing warehouse costs and query performance at scale

What Does a Typical Day Look Like?

A Data Engineer’s day revolves around maintaining reliable pipelines, improving data quality, and enabling teams with scalable access to trustworthy data. You will:

• Monitor pipeline health and troubleshoot failed jobs in Airflow or related orchestration systems
• Build and maintain ingestion pipelines for APIs, SaaS platforms, and operational databases
• Optimize SQL queries and warehouse performance to improve efficiency and reduce cloud costs
• Collaborate with analysts and data scientists to provide curated datasets for reporting and modeling
• Implement validation checks and monitoring to prevent downstream data quality issues
• Document data models, transformations, and workflows to ensure scalability and maintainability

In essence: you ensure the organization has accurate, timely, and reliable data powering operational, analytical, and strategic decisions.

Key Metrics for Success (KPIs)

• Pipeline uptime ≥ 99%
• Data freshness maintained within agreed SLAs
• Zero critical data quality issues reaching downstream reporting systems
• Improved warehouse query performance and cost optimization
• Timely delivery of scalable and reliable datasets
• Positive feedback from analysts, data scientists, and business stakeholders

Interview Process

• Initial Phone Screen
• Video Interview with Pavago Recruiter
• Technical Assessment (e.g., build a small ETL pipeline or optimize a SQL query)
• Client Interview with Engineering/Data Team
• Offer & Background Verification

#DataEngineer #ETL #DataPipelines #BigQuery #Snowflake #Redshift #Airflow #Python #SQL #CloudData #AnalyticsEngineering #DataInfrastructure #RemoteWork #DataEngineeringJobs

Data Engineer Related jobs

Other jobs at Pavago

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.