Key Facts

Remote From:

United States

Full time

English

Hard Skills

Other Skills

•
Collaboration
•
Analytical Skills
•
Detail Oriented
•
Problem Solving

Roles & Responsibilities

Experience in big data processing with Apache Spark (PySpark or Scala)
Strong experience with data migration from legacy systems to Spark
Proficiency in Scalding, MapReduce, Hadoop, and Hive
Hands-on experience writing unit tests for Spark pipelines and strong SQL/data validation skills

Requirements:

Analyze the current Scalding-based data pipelines, documenting existing business logic and transformations
Migrate the logic to Spark (PySpark/Scala), refactor data transformations and aggregations, and optimize Spark jobs for performance and scalability
Develop and execute data parity validation tests to compare outputs between Scalding and Spark implementations; resolve discrepancies with stakeholders
Write robust unit and integration tests and enforce engineering best practices (modular, reusable, well-documented) for Spark pipelines

Georgia IT, Inc.

About Georgia IT, Inc.

Georgia IT, Inc. provides IT Consulting for a wide range of IT services and custom build turn-key enterprise solutions. GIT specializes in improving business scalability and efficiency through BSM and SBA Solutions. GIT transforms business with service management and service automation solutions. We are BMC & HP partners. GIT Services include custom built enterprise software and customer-centric web portals, network design and implementation, remote and site-to-site VPN, network and server security assessment and setup, server and desktop virtualization, and many others. GIT also provides IT Consulting in many areas such as Disaster Recovery Planning, Enterprise Data Backup Strategy, Long Term Strategic IT Planning and Augmentation Professional Services Solutions. Professional Services Specialties: Business Process Integration, Software Integration & Development, Web portal, Networking, Remote and site-to-site VPN, Virtualization Solutions, Product Development Please contact us Hrus@georgiait.com / uthay@georgiait.com/470-798-5000 x 1010 / (732) 890-2535 direct

Founded: 2018

Company size: 51 - 200

Website LinkedIn See all jobs →

Job description

Job Title: Spark Developer / Engineer (2 positions)
Location: US Remote, work during PST time zone
Duration: 6-12 Months

Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:
Understanding the Existing Scalding Codebase

Analyze the current Scalding-based data pipelines.
Document existing business logic and transformations.

Migrating the Logic to Spark

Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.
Refactor data transformations and aggregations in Spark.
Optimize Spark jobs for efficiency and scalability.

Ensuring Data Parity & Validation

Develop data parity tests to compare outputs between Scalding and Spark implementations.
Identify and resolve any discrepancies between the two versions.
Work with stakeholders to validate correctness.

Writing Unit Tests & Improving Code Quality