Logo for Georgia IT, Inc.

SPARK DEVELOPER-Remote

Roles & Responsibilities

  • Experience in big data processing with Apache Spark (PySpark or Scala)
  • Strong experience with data migration from legacy systems to Spark
  • Proficiency in Scalding, MapReduce, Hadoop, and Hive
  • Hands-on experience writing unit tests for Spark pipelines and strong SQL/data validation skills

Requirements:

  • Analyze the current Scalding-based data pipelines, documenting existing business logic and transformations
  • Migrate the logic to Spark (PySpark/Scala), refactor data transformations and aggregations, and optimize Spark jobs for performance and scalability
  • Develop and execute data parity validation tests to compare outputs between Scalding and Spark implementations; resolve discrepancies with stakeholders
  • Write robust unit and integration tests and enforce engineering best practices (modular, reusable, well-documented) for Spark pipelines

Job description


Job Title: Spark Developer / Engineer (2 positions)
Location: US Remote, work during PST time zone
Duration: 6-12 Months


Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:
Understanding the Existing Scalding Codebase
  • Analyze the current Scalding-based data pipelines.
  • Document existing business logic and transformations.
Migrating the Logic to Spark
  • Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.
  • Refactor data transformations and aggregations in Spark.
  • Optimize Spark jobs for efficiency and scalability.
Ensuring Data Parity & Validation
  • Develop data parity tests to compare outputs between Scalding and Spark implementations.
  • Identify and resolve any discrepancies between the two versions.
  • Work with stakeholders to validate correctness.
Writing Unit Tests & Improving Code Quality
  • Implement robust unit and integration tests for Spark jobs.
  • Ensure code meets engineering best practices (modular, reusable, and well-documented).

Required Qualifications:
  • Experience in big data processing with Apache Spark (PySpark or Scala).
  • Strong experience with data migration from legacy systems to Spark.
  • Proficiency in Scalding and MapReduce frameworks.
  • Experience with Hadoop, Hive, and distributed data processing.
  • Hands-on experience in writing unit tests for Spark pipelines.
  • Strong SQL and data validation experience.
  • Proficiency in Python, Scala
  • Knowledge of CI/CD pipelines for data jobs.
  • Familiarity with Apache Airflow orchestration tool.

Related jobs

Other jobs at Georgia IT, Inc.

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.