Inspiring the future for more than 40 years We share the knowledge and teach the skills people need to change their world. For more than 40 years, O’Reilly has imparted the world-shaping ideas of innovators through books, articles, conferences, and our online learning platform. When individuals, teams, and entire enterprises connect with the world's leading experts and content providers, anything is possible. Whether you're working to advance your career, be a better manager, or achieve the next breakthrough in technology or business, learning new skills is at the heart of it all. With a range of formats including live online training courses, interactive tutorials, books, videos, and case studies, we equip all members of the workforce with the insight they need to stay ahead in an ever-changing economy. We want to prepare people to solve challenging problems and inspire them with what’s possible for the future. Something we’ve been doing for over 40 years.

Description

About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 45 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more: https:www.oreilly.comabout

Diversity
At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more: https:www.oreilly.comdiversity

About the Team
Our data platform team is dedicated to establishing a robust data infrastructure, facilitating easy access to quality, reliable, and timely data for reporting, analytics, and actionable insights. We focus on designing and building a sustainable and scalable data architecture, treating data as a core corporate asset. Our efforts also include process improvement, governance enhancement, and addressing application, functional, and reporting needs. We value teammates who are helpful, respectful, communicate openly, and prioritize the best interests of our users. Operating across various cities and timezones in the US, our team fosters collaboration to deliver work that brings pride and fulfillment.

About the Role
We are seeking an experienced and detailoriented Data Integration Engineer to contribute to the development and expansion of a suite of systems and tools, with a primary focus on ETL processes. The ideal candidate will have a deep understanding of modern data engineering concepts and will have shipped or supported code and infrastructure with a user base in the millions and datasets with billions of records. The candidate will be routinely implementing features, fixing bugs, performing maintenance, consulting with product managers, and troubleshooting problems. Changes you make will be accompanied by tests to confirm desired behavior. Code reviews, in the form of pull requests reviewed by peers, are a regular and expected part of the job as well.

Salary Range: $110,000 $138,000

What You’ll Do ETL Development with Talend:Architect and build complex ETL pipelines in Talend Data Integration, ensuring scalability, reusability, and maintainability of workflows.
Implement sophisticated data transformations, including lookups, joins, aggregates, and custom routines using Talend’s tMap, tJavaRow, tSQLROW and JSON components.
Develop data pipelines or features related to data ingestion, transformation, or storage using Python and relational databases (e.g., PostgreSQL) or cloudbased data warehousing (e.g.,BigQuery)
Automate data ingestion from REST APIs, FTP servers, cloud platforms, and relational databases into cloud or onpremises storage.
Leverage Talends integration with BigQuery for seamless data flow into analytical systems, employing native connectors.
Familiarity with Talend’s debugging tools, logs, and monitoring dashboards to troubleshoot and resolve job execution issues.
Optimize Talend jobs by using efficient memory settings, parallelization, and dependency injection for highvolume data processing.
Integrate Talend with Google Cloud Storage, PubSub, and Dataflow to create hybrid workflows combining batch and realtime data processing.
Manage Talend deployments using Talend Management Console (TMC) for scheduling, monitoring, and lifecycle management.
BigQuery Data Management:Build highperformance BigQuery datasets, implementing advanced partitioning (DATE, RANGE) and clustering for costeffective queries.
Proficient in working with JSON and ARRAY data structures, with expertise in leveraging BigQuery to efficiently nest and unnest objects as required for complex data transformations and analysis.
Write advanced SQL queries for analytics, employing techniques like window functions, CTEs, and array operations for complex transformations.
Implement BigQuery federated queries to integrate external datasets from Cloud Storage or other data warehouses.
Fundamental understanding of Designing and managing BigQuery reservations and slots involves allocating compute resources effectively to balance performance, cost, and workload demands across various teams and projects.
Realtime Data Pipelines with Google PubSub and Dataflow:Implement PubSub topics and subscriptions to manage realtime data ingestion pipelines effectively.
Integrate PubSub with Talend for realtime ETL workflows, ensuring lowlatency data delivery.
Implement dynamic windowing and triggers for efficient aggregation and event handling.
Optimize streaming pipelines by finetuning autoscaling policies, worker counts, and resource configurations.
PostgreSQL Database Development and Optimization:Be able to enhance, modify existing PostgreSQL queries and functions
Write advanced PLpgSQL functions and triggers for procedural data logic.
As needed develop materialized views and indexed expressions to speed up query execution for large datasets.
Monitor and optimize queries through EXPLAINANALYZE.

What You’ll Have
Required: 6+ years of professional data engineering experience (equivalent education andor experience may be considered)
Strong experience with Talend Data Integration for designing and optimizing ETL pipelines
Excellent Python and PostgreSQL development and debugging skills
Experience in data extraction, transformation, and loading (ETL) using Python.
Experience working with JSON and ARRAY data structures in BigQuery, including nesting and unnesting
Experience in integrating and optimizing streaming data pipelines in a cloud environment
Experience with deployment tools such as Jenkins to build automated CICD pipelines
Handson experience with Google Cloud Storage, PubSub, Dataflow, and Dataprep for ETL and realtime data processing
Proficient in building and managing realtime data pipelines with Google PubSub and Dataflow
Proficient in BigQuery, including dataset management, advanced SQL, partitioning, clustering, and federated queries
Solid understanding of PostgreSQL, including PLpgSQL, query optimization, and advanced functions
Familiarity with optimizing BigQuery performance through reservations, slots, and costeffective query techniques
Proven experience in creating, managing, and merging branches in Git, following best practices for version control.
Expertise in resolving merge conflicts, with a deep understanding of branching strategies, rebasing, and other Git workflows.
Extensive experience with GitHub pull requests (PRs), including creating, reviewing, and approving code changes in a collaborative environment.
Excellent problemsolving skills and ability to optimize highvolume data workflows
Strong communication skills to collaborate effectively with crossfunctional teams
Strong drive to experiment, learn and improve your skills
Respect for the craft—you write selfdocumenting code with modern techniques
Great written communication skills—we do a lot of work asynchronously in Slack and Google Docs
Empathy for our users—a willingness to spend time understanding their needs and difficulties is central to the team
Desire to be part of a compact, fun, and hardworking team

Preferred:Experience Integrating BigQuery ML for advanced machine learning use cases, including regression, classification, and timeseries forecasting.
Additional Information: At this time, OReilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H1B, STEM, OPT, CPT, EAD and Permanent Residency process)

Data Integration Engineer

Offer summary

Qualifications:

Key responsibilities:

Job description

Description

Required profile

Experience

Hard Skills

Other Skills

Data Engineer Related jobs

Staff Data Engineer, Energy

Data Engineer - all genders

Staff Data Warehouse Engineer

Data Engineer (5 to 7 yrs) (PySpark / Azure / Python / DataBricks) - Remote India

Pessoa Engenheira de Dados Sr. Cód. 5187