Bachelor's degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent
2+ years of experience with Scala, Spark, Hadoop (Security, Spark on YARN, architectural knowledge), HBase and Hive
2+ years of experience with RDBMS (MySQL/Postgres/MariaDB) and 1+ year of CI/CD experience
Nice to have: Kafka, Spark Streaming, Apache Phoenix, Memcached/Redis caching, Spark ML, and FP with Scala (cats/scalaz)
Requirements:
Design, develop, and maintain data pipelines using Hadoop ecosystem components (Spark, Hive, HBase) and ensure data quality
Build and optimize data models and schemas in Hive/HBase and integrate with RDBMS
Implement CI/CD pipelines for data applications and manage cloud-based deployments
Collaborate with data scientists, analysts, and stakeholders to deliver analytics-ready datasets and reporting
Job description
Experience -Must have: a) Scala: Minimum 2 years of experience b) Spark: Minimum 2 years of experience c) Hadoop: Minimum 2 years of experience (Security, Spark on yarn, Architectural knowledge) d) Hbase: Minimum 2 years of experience e) Hive - Minimum 2 years of experience f) RDBMS (MySql / Postgres / Maria) - Minimum 2 years of experience g) CI/CD Minimum 1 year of experience Experience (Good to have): a) Kafka b) Spark Streaming c) Apache Phoenix d) Caching layer (Memcache / Redis) e) Spark ML f) FP (Scala cats / scalaz) Qualifications Bachelor's degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent with at-least 2 years of experience in big data systems such as Hadoop as well as cloud-based solutions