Data Integration Engineer

extra parental leave
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Minimum 6 years of professional data engineering experience., Strong expertise in Talend Data Integration for ETL pipelines., Proficiency in Python programming and PostgreSQL database development., Experience with BigQuery, including advanced SQL, data structures, and performance optimization..

Key responsibilities:

  • Design and develop complex ETL pipelines using Talend and Python.
  • Build and optimize data workflows for ingestion, transformation, and storage.
  • Implement real-time data pipelines with Pub/Sub and Dataflow.
  • Collaborate with cross-functional teams to ensure data quality and accessibility.

O'Reilly logo
O'Reilly E-learning SME https://www.oreilly.com/
201 - 500 Employees
See all jobs

Job description

Description

About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 45 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more: https:www.oreilly.comabout

Diversity

At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more:
https:www.oreilly.comdiversity

About the Team

Our data platform team is dedicated to establishing a robust data infrastructure, facilitating easy access to quality, reliable, and timely data for reporting, analytics, and actionable insights. We focus on designing and building a sustainable and scalable data architecture, treating data as a core corporate asset. Our efforts also include process improvement, governance enhancement, and addressing application, functional, and reporting needs. We value teammates who are helpful, respectful, communicate openly, and prioritize the best interests of our users. Operating across various cities and timezones in the US, our team fosters collaboration to deliver work that brings pride and fulfillment.


About the Role

We are seeking an experienced and detailoriented Data Integration Engineer to contribute to the development and expansion of a suite of systems and tools, with a primary focus on ETL processes. The ideal candidate will have a deep understanding of modern data engineering concepts and will have shipped or supported code and infrastructure with a user base in the millions and datasets with billions of records. The candidate will be routinely implementing features, fixing bugs, performing maintenance, consulting with product managers, and troubleshooting problems. Changes you make will be accompanied by tests to confirm desired behavior. Code reviews, in the form of pull requests reviewed by peers, are a regular and expected part of the job as well.

Salary Range: $110,000 $138,000

What You’ll Do
  • ETL Development with Talend:
    • Architect and build complex ETL pipelines in Talend Data Integration, ensuring scalability, reusability, and maintainability of workflows.
    • Implement sophisticated data transformations, including lookups, joins, aggregates, and custom routines using Talend’s tMap, tJavaRow, tSQLROW and JSON components.
    • Develop data pipelines or features related to data ingestion, transformation, or storage using Python and relational databases (e.g., PostgreSQL) or cloudbased data warehousing (e.g.,BigQuery)
    • Automate data ingestion from REST APIs, FTP servers, cloud platforms, and relational databases into cloud or onpremises storage.
    • Leverage Talends integration with BigQuery for seamless data flow into analytical systems, employing native connectors.
    • Familiarity with Talend’s debugging tools, logs, and monitoring dashboards to troubleshoot and resolve job execution issues.
    • Optimize Talend jobs by using efficient memory settings, parallelization, and dependency injection for highvolume data processing.
    • Integrate Talend with Google Cloud Storage, PubSub, and Dataflow to create hybrid workflows combining batch and realtime data processing.
    • Manage Talend deployments using Talend Management Console (TMC) for scheduling, monitoring, and lifecycle management.
      • BigQuery Data Management:
        • Build highperformance BigQuery datasets, implementing advanced partitioning (DATE, RANGE) and clustering for costeffective queries.
        • Proficient in working with JSON and ARRAY data structures, with expertise in leveraging BigQuery to efficiently nest and unnest objects as required for complex data transformations and analysis.
        • Write advanced SQL queries for analytics, employing techniques like window functions, CTEs, and array operations for complex transformations.
        • Implement BigQuery federated queries to integrate external datasets from Cloud Storage or other data warehouses.
        • Fundamental understanding of Designing and managing BigQuery reservations and slots involves allocating compute resources effectively to balance performance, cost, and workload demands across various teams and projects.
          • Realtime Data Pipelines with Google PubSub and Dataflow:
            • Implement PubSub topics and subscriptions to manage realtime data ingestion pipelines effectively.
            • Integrate PubSub with Talend for realtime ETL workflows, ensuring lowlatency data delivery.
            • Implement dynamic windowing and triggers for efficient aggregation and event handling.
            • Optimize streaming pipelines by finetuning autoscaling policies, worker counts, and resource configurations.
              • PostgreSQL Database Development and Optimization:
                • Be able to enhance, modify existing PostgreSQL queries and functions
                • Write advanced PLpgSQL functions and triggers for procedural data logic.
                • As needed develop materialized views and indexed expressions to speed up query execution for large datasets.
                • Monitor and optimize queries through EXPLAINANALYZE.
                    • What You’ll Have

                      Required:
                      • 6+ years of professional data engineering experience (equivalent education andor experience may be considered)
                      • Strong experience with Talend Data Integration for designing and optimizing ETL pipelines
                      • Excellent Python and PostgreSQL development and debugging skills
                      • Experience in data extraction, transformation, and loading (ETL) using Python.
                      • Experience working with JSON and ARRAY data structures in BigQuery, including nesting and unnesting
                      • Experience in integrating and optimizing streaming data pipelines in a cloud environment
                      • Experience with deployment tools such as Jenkins to build automated CICD pipelines
                      • Handson experience with Google Cloud Storage, PubSub, Dataflow, and Dataprep for ETL and realtime data processing
                      • Proficient in building and managing realtime data pipelines with Google PubSub and Dataflow
                      • Proficient in BigQuery, including dataset management, advanced SQL, partitioning, clustering, and federated queries
                      • Solid understanding of PostgreSQL, including PLpgSQL, query optimization, and advanced functions
                      • Familiarity with optimizing BigQuery performance through reservations, slots, and costeffective query techniques
                      • Proven experience in creating, managing, and merging branches in Git, following best practices for version control.
                      • Expertise in resolving merge conflicts, with a deep understanding of branching strategies, rebasing, and other Git workflows.
                      • Extensive experience with GitHub pull requests (PRs), including creating, reviewing, and approving code changes in a collaborative environment.
                      • Excellent problemsolving skills and ability to optimize highvolume data workflows
                      • Strong communication skills to collaborate effectively with crossfunctional teams
                      • Strong drive to experiment, learn and improve your skills
                      • Respect for the craft—you write selfdocumenting code with modern techniques
                      • Great written communication skills—we do a lot of work asynchronously in Slack and Google Docs
                      • Empathy for our users—a willingness to spend time understanding their needs and difficulties is central to the team
                      • Desire to be part of a compact, fun, and hardworking team
                        • Preferred:
                          • Experience Integrating BigQuery ML for advanced machine learning use cases, including regression, classification, and timeseries forecasting.
                            • Additional Information: At this time, OReilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H1B, STEM, OPT, CPT, EAD and Permanent Residency process)

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
E-learning
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Empathy
  • Collaboration
  • Communication
  • Problem Solving

Data Engineer Related jobs